Feature CNV-16692: OVN Secondary Network

View the Description

Goal: Establish OVN as THE SDN for CNV to meet modern virtualization network needs.
This MUST be closely aligned to the OCP OVN effort.

With

Services
2nd NIC on primary/pod OVN network

Epic CNV-30298: UI for OVN Secondary Localnet Network

View the Description View the linked PRs

Goal

Allow administrators to create new Network Attachment Definitions for OVN Kubernetes secondary localnet networks.

Non-Requirements

<List of things not included in this epic, to alleviate any doubt raised during the grooming process.>

Notes

This should be a new item in the dropdown menu of NetworkAttachmentDefinition dialog. When selecting "OVN Kubernetes secondary localnet network" two additional fields should show up: "Bridge mapping", MTU (optional) and "VLAN (optional)". The "Bridge mapping" attribute should have a tooltip explaining what it is. That should render into:

apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: <name>
  namespace: <namespace>
spec:
  config: |2
    {
            "cniVersion": "0.4.0",
            "name": "<bridge mapping>",
            "type": "ovn-k8s-cni-overlay",
            "topology":"localnet",
            "vlanID": <VLAN>, # set only if passed from the user
            "mtu": <MTU>, # set only if passed from the user
            "netAttachDefName": "<namespace>/<name>"
    }

Tooltip text: "Reference to a physical network name. A bridge mapping must be configured on cluster nodes to map between physical network names and Open vSwitch bridges."
Draft of the localnet documentation https://github.com/openshift/openshift-docs/blob/0fe4ec0bcc31afd4ba5060fb87dc9c347da34603/modules/configuring-localnet-switched-topology.adoc#L92

Done Checklist

DEV - Upstream code and tests merged: https://github.com/openshift/console/pull/13293
QE - Test plan: https://polarion.engineering.redhat.com/polarion/#/project/CNV/workitem?id=CNV-10548
QE - Test automation: https://gitlab.cee.redhat.com/cnv-qe/kubevirt-ui/-/merge_requests/539

https://github.com/openshift/console/pull/13293

Feature OBSDA-372: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Epic OU-224: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Task OU-241: Remove unused code from web console monitoring/ dir

View the Description View the linked PRs

Some of the code under the web console's /frontend/public/components/monitoring/ dir is no longer used, so it can be removed.

There may also be some code in the redux actions and reducers that are no longer used that can also be removed.

https://github.com/openshift/console/pull/13114

Feature OCPSTRAT-1002: Remove Terraform from the OpenStack installer-TP

View the Description

Feature Overview (aka. Goal Summary)

As a result of Hashicorp's license change to BSL, Red Hat OpenShift needs to remove the use of Hashicorp's Terraform from the installer – specifically for OpenStack deployments which currently use Terraform for setting up the infrastructure.

To avoid an increased support overhead once the license changes at the end of the year, we want to provision OpenStack infrastructure without the use of Terraform.

Requirements (aka. Acceptance Criteria):

The OpenStack Installer no longer contains or uses Terraform.
The new provider should aim to provide the same results and have parity with the existing OpenStack Terraform provider. Specifically, we should aim for feature parity against the install config and the cluster it creates to minimize impact on existing customers' UX.

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

High-level list of items that are out of scope. Initial completion during Refinement status.

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.

Interoperability Considerations

Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.

Epic OSASINFRA-2905: Move upstream CAPO to API version v1beta1

View the Description

Goal

Move CAPO (cluster-api-provider-openstack) to a stable API.

Why is this important?

Currently OpenShift on OpenStack is using MAPO. This uses objects from the upstream CAPO project under the hood but not the APIs. We would like to start using CAPO and declare MAPO as deprecated and frozen, but before we do that upstream CAPO's own API needs to be declared stable.

Upstream CAPO's API is currently at v1alpha6. There are a number of incompatible changes already planned for the API which have prevented us from declaring it v1beta1. We should make those changes and move towards a stable API.

The changes need to be accompanied by an improvement in test coverage of API versions.

Upstream issues targeted for v1beta1 should be tracked in the v0.7 milestone: https://github.com/kubernetes-sigs/cluster-api-provider-openstack/issues?q=is%3Aopen+is%3Aissue+milestone%3Av0.7

Another option is to switch to cluster-capi-operator if it graduates, which would mean only a single API would be maintained.

Scenarios

N/A. This is purely upstream work for now. We will directly benefit from this work once we switch to CAPO in a future release.

Acceptance Criteria

Upstream CAPO provides a v1beta1 API
Upstream CAPO includes e2e tests using envtest (https://book.kubebuilder.io/reference/envtest.html) which will allow us to avoid breaks in API compatibility

Dependencies (internal and external)

None.

Previous Work (Optional):

N/A

Task OSASINFRA-3272: MAPO to use CAPO v1alpha7

View the Description View the linked PRs

In our way to move forward, we need to bump CAPO into MAPO from v1alpha6 to v1alpha7.

https://github.com/openshift/machine-api-provider-openstack/pull/87

Feature OCPSTRAT-1009: Enable /dev/fuse in unprivileged containers within OpenShift

View the Description

As an Openshift admin i want to leverage /dev/fuse in unprivileged containers so that to successfully integrate cloud storage into OpenShift application in a secure, efficient, and scalable manner. This approach simplifies application architecture and allows developers to interact with cloud storage as if it were a local filesystem, all while maintaining strong security practices.

Epic OCPNODE-1942: Allow /dev/fuse by default in CRI-O

View the Description

Epic Goal

Give users the ability to mount /dev/fuse into a pod by default with the `io.kubernetes.cri-o.Devices` annotation

Why is this important?

It's the first step in a series of steps that allows users to run unprivileged containers within containers
It also gives access to faster builds within containers

Scenarios

as a developer on openshift, I would like to run builds within containers in a performant way

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story OCPNODE-1943: give containers access to /dev/fuse by default in CRI-O

View the linked PRs

https://github.com/openshift/machine-config-operator/pull/4054

Feature OCPSTRAT-1027: Remove Terraform from OpenShift Installer. Cross-platform preparation work

View the Description

Feature Overview (aka. Goal Summary)

As a result of Hashicorp's license change to BSL, Red Hat OpenShift needs to remove the use of Hashicorp's Terraform from the installer – specifically for IPI deployments which currently use Terraform for setting up the infrastructure.

To avoid an increased support overhead once the license changes at the end of the year, we want to provision OpenShift on the existing supported providers' infrastructure without the use of Terraform.

This feature will be used to track all the CAPI preparation work that is common for all the supported providers

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

High-level list of items that are out of scope. Initial completion during Refinement status.

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

Interoperability Considerations

Epic CORS-2840: CAPI install without a management cluster

View the Description View the linked PRs

Epic Goal

Day 0 Cluster Provisioning
Compatibility with existing workflows that do not require a container runtime on the host

Why is this important?

This epic would maintain compatibility with existing customer workflows that do not have access to a management cluster and do not have the dependency of a container runtime

Scenarios

openshift-install running in customer automation

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

https://github.com/openshift/installer/pull/7823

Story CORS-2852: Run CAPI control plane binaries locally

View the Description View the linked PRs

PoC & design for running CAPI control plane using binaries.

Feature OCPSTRAT-1057: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Epic OSASINFRA-3212: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Feature OCPSTRAT-1057: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Task OSASINFRA-3261: Create playbook task to create IPv6 subnet

View the linked PRs

https://github.com/openshift/installer/pull/7727

Feature OCPSTRAT-1059: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Epic OSASINFRA-3195: Control Plane with rootVolumes and etcd on local disk (TechPreview)

View the Description

As a customer, I would like to deploy OpenShift On OpenStack, using the IPI workflow where my control plane would have 3 machines and each machine would have use a root volume (a Cinder volume attached to the Nova server) and also an attached ephemeral disk using local storage, that would only be used by etcd.

As this feature will be TechPreview in 4.15, this will only be implemented as a day 2 operation for now. This might or might not change in the future.

We know that etcd requires storage with strong performance capabilities and currently a root volume backed by Ceph has difficulties to provide these capabilities.

By also attaching local storage to the machine and mounting it for etcd would solve the performance issues that we saw when customers were using Ceph as the backend for the control plane disks.

Gophercloud already accepts to create a server with multiple ephemeral disks:

https://github.com/gophercloud/gophercloud/blob/master/openstack/compute/v2/extensions/bootfromvolume/doc.go#L103-L151

We need to figure out how we want to address that in CAPO, probably involving a new API; that later would be used in openshift (MAPO, and probably installer).

We'll also have to update the OpenStack Failure Domain in CPMS.

ARO (Azure) has conducted some benckmarks and is now recommending to put etcd on a separated data disk:

https://docs.google.com/document/d/1O_k6_CUyiGAB_30LuJFI6Hl93oEoKQ07q1Y7N2cBJHE/edit

Also interesting thread: https://groups.google.com/u/0/a/redhat.com/g/aos-devel/c/CztJzGWdsSM/m/jsPKZHSRAwAJ

Task OSASINFRA-3243: MAPO - Additional Data Volumes in OpenStack Machines

View the Description View the linked PRs

Once we have defined an API for data volumes, we'll need to add support for this new API in MAPO so the user can update their Machines on day 2 to be redeployed with etcd on local disk.

https://github.com/openshift/machine-api-provider-openstack/pull/88

Task OSASINFRA-3280: DOC: list of things to document for etcd on local disk

View the Description View the linked PRs

Day 2 install is documented here (this document was originally created for QE, as a FID).
We need to document that when using rootVolumes for the Control Plane, etcd should be placed on a local ephemeral disk and we document how.
We also need to update https://docs.openshift.com/container-platform/4.13/scalability_and_performance/recommended-performance-scale-practices/recommended-etcd-practices.html#move-etcd-different-disk_recommended-etcd-practices with 2 adjustments: the command that is used is mkfs.xfs -f and also we use /dev/vdb.

Feature OCPSTRAT-1059: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Feature OCPSTRAT-1060: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Epic OSASINFRA-3210: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Feature OCPSTRAT-1060: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Task OSASINFRA-3237: Installer: make controlPlanePort field GA

View the Description View the linked PRs

We only allow usage of controlPlanePort as a TechPreview feature. We should move it to GA.

https://github.com/openshift/installer/blob/4e4eec6f3a8630b3d2d83020fac6eaa420707085/pkg/types/openstack/platform.go#L123-L128

Open questions:

does this implies the removal of support for machinesSubnet from the CI jobs and documentation?

https://github.com/openshift/installer/pull/7570

Feature OCPSTRAT-1061: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Epic OSASINFRA-3228: Kuryr Support EOL

View the Description

Goal

Goal is to remove Kuryr from the payload and being an SDN option.

Why is this important?

Kuryr is deprecated, we have a migration path and dropping it means relieving a lot of resources.

Acceptance Criteria

CI removed in 4.15.
Images no longer part of the payload.
Installer not accepting Kuryr as SDN option.
Docs and release notes updated.

Feature OCPSTRAT-1061: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Task OSASINFRA-3290: Remove Kuryr from must-gather

View the Description View the linked PRs

Kuryr is no longer supported in 4.15 and there cannot be a 4.15 cluster with Kuryr, either a new one or upgraded. Therefore we want to remove Kuryr from must-gather.

https://github.com/openshift/must-gather/pull/393

Task OSASINFRA-3291: Remove Kuryr from origin tests

View the linked PRs

https://github.com/openshift/origin/pull/28397

Task OSASINFRA-3303: Installer: remove generation of trunks name

View the Description View the linked PRs

Since Kuryr removal we don't need to generate the trunks name anymore. They can be removed.

https://github.com/openshift/installer/pull/7772

Task OSASINFRA-3229: Remove Kuryr option from the installer

View the Description View the linked PRs

Installer should no longer accept Kuryr as NetworkType. If user choose it, Installer should show clear error about Kuryr no longer being supported.

https://github.com/openshift/installer/pull/7675

Task OSASINFRA-3289: Remove Kuryr from the cluster-cloud-controller-manager-operator

View the linked PRs

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/294

Feature OCPSTRAT-110: Hypershift-enablement for short-lived token authentication flows with OLM-managed operators with CCO

View the Description

Feature Overview:

Hypershift-provisioned clusters, regardless of the cloud provider support the proposed integration for OLM-managed integration outlined in ~~OCPBU-559~~ and ~~OCPBU-560~~.

Goals

There is no degradation in capability or coverage of OLM-managed operators support short-lived token authentication on cluster, that are lifecycled via Hypershift.

Requirements:

the flows in ~~OCPBU-559~~ and ~~OCPBU-560~~ need to work unchanged on Hypershift-managed clusters
most likely this means that Hypershift needs to adopt the CloudCredentialOperator
all operators enabled as part of ~~OCPBU-563~~, OCPBU-564, ~~OCPBU-566~~ and OCPBU-568 need to be able to leverage short-lived authentication on Hypershift-managed clusters without being aware that they are on Hypershift-managed clusters
also OCPBU-569 and OCPBU-570 should be achievable on Hypershift-managed clusters

Background

Currently, Hypershift lacks support for CCO.

Customer Considerations

Currently, Hypershift will be limited to deploying clusters in which the cluster core operators are leveraging short-lived token authentication exclusively.

Documentation Considerations

If we are successful, no special documentation should be needed for this.

Epic CCO-386: HyperShift Integration

View the Description

Outcome Overview

Operators on guest clusters can take advantage of the new tokenized authentication workflow that depends on CCO.

Success Criteria

CCO is included in HyperShift and its footprint is minimal while meeting the above outcome.

Expected Results (what, how, when)

Post Completion Review – Actual Results

After completing the work (as determined by the "when" in Expected Results above), list the actual results observed / measured during Post Completion review(s).

Story CCO-511: Add CCO to HyperShift Management plane

View the Description View the linked PRs

This is a clone of issue ~~CCO-388~~. The following is the description of the original issue:
—
Every guest cluster should have a running CCO pod with its kubeconfig attached to it.

Enchancement doc: https://github.com/openshift/enhancements/blob/master/enhancements/cloud-integration/tokenized-auth-enablement-operators-on-cloud.md

https://github.com/openshift/hypershift/pull/3336

Story CCO-421: Separate Webhook Logic & RBAC From Core CCO

View the Description View the linked PRs

CCO logic for managing webhooks is a) entirely separate from the core functionality of the CCO and b) requires a lot of extra RBAC. In deployment topologies like HyperShift, we don't want this additional functionality and would like to be able to cleanly turn it off and remove the excess RBAC.

https://github.com/openshift/cloud-credential-operator/pull/592

Feature OCPSTRAT-1203: [Tech Preview] OpenShift on Oracle Cloud Infrastructure (OCI) Bare metal

View the Description

BU Priority Overview

Enable installation and lifecycle support of OpenShift 4 on Oracle Cloud Infrastructure (OCI) Bare metal

Goals

Validating OpenShift on OCI baremetal to make it officially supported.
Enable installation of OpenShift 4 on OCI bare metal using Assisted Installer.
Provide published installation instructions for how to install OpenShift on OCI baremetal
OpenShift 4 on OCI baremetal can be updated that results in a cluster and applications that are in a healthy state when update is completed.
Telemetry reports back on clusters using OpenShift 4 on OCI baremetal for connected OpenShift clusters (e.g. platform=external or none + some other indicator to know it's running on OCI baremetal).

Use scenarios

As a customer, I want to run OpenShift Virtualization on OpenShift running on OCI baremetal.
As a customer, I want to run Oracle BRM on OpenShift running OCI baremetal.

Why is this important

Customers who want to move from on-premises to Oracle cloud baremetal
OpenShift Virtualization is currently only supported on baremetal

Requirements

Requirement	Notes
OCI Bare Metal Shapes must be certified with RHEL	It must also work with RHCOS (see iSCSI boot notes) as OCI BM standard shapes require RHCOS iSCSI to boot (OCPSTRAT-1246) Certified shapes: https://catalog.redhat.com/cloud/detail/249287
Successfully passing the OpenShift Provider conformance testing – this should be fairly similar to the results from the OCI VM test results.	Oracle will do these tests.
Updating Oracle Terraform files
Making the Assisted Installer modifications needed to address the CCM changes and surface the necessary configurations.	Support Oracle Cloud in Assisted-Installer CI: ~~MGMT-14039~~

RFEs:

~~RFE-3635~~ - Supporting Openshift on Oracle Cloud Infrastructure(OCI) & Oracle Private Cloud Appliance (PCA)

OCI Bare Metal Shapes to be supported

Any bare metal Shape to be supported with OCP has to be certified with RHEL.

From the certified Shapes, those that have local disks will be supported. This is due to the current lack of support in RHCOS for the iSCSI boot feature. ~~OCPSTRAT-749~~ is tracking adding this support and remove this restriction in the future.

As of Aug 2023 this excludes at least all the Standard shapes, BM.GPU2.2 and BM.GPU3.8, from the published list at: https://docs.oracle.com/en-us/iaas/Content/Compute/References/computeshapes.htm#baremetalshapes

Assumptions

Pre-requisite: RHEL certification which includes RHEL and OCI baremetal shapes (instance types) has successfully completed.

Epic MGMT-15721: Oracle cloud improvements

View the Description

Feature goal (what are we trying to solve here?)

Please describe what this feature is going to do.

DoD (Definition of Done)

Please describe what conditions must be met in order to mark this feature as "done".

Does it need documentation support?

If the answer is "yes", please make sure to check the corresponding option.

Feature origin (who asked for this feature?)

A Customer asked for it

- Name of the customer(s)
- How many customers asked for it?
- Can we have a follow-up meeting with the customer(s)?

A solution architect asked for it

- Name of the solution architect and contact details
- How many solution architects asked for it?
- Can we have a follow-up meeting with the solution architect(s)?

Internal request

- Who asked for it?

Catching up with OpenShift

Reasoning (why it’s important?)

Please describe why this feature is important
How does this feature help the product?

Competitor analysis reference

Do our competitors have this feature?
- Yes, they have it and we can have some reference
- No, it's unique or explicit to our product
- No idea. Need to check

Feature usage (do we have numbers/data?)

We have no data - the feature doesn’t exist anywhere
Related data - the feature doesn’t exist but we have info about the usage of associated features that can help us
- Please list all related data usage information
We have the numbers and can relate to them
- Please list all related data usage information

Feature availability (why should/shouldn't it live inside the UI/API?)

Please describe the reasoning behind why it should/shouldn't live inside the UI/API
If it's for a specific customer we should consider using AMS
Does this feature exist in the UI of other installers?

Task MGMT-15796: [BE] add cloud contoller manager setting into install-config when deploying oci platform

View the Description View the linked PRs

https://github.com/openshift/installer/pull/7457 introduces a change of behavior, the Cloud Controller Manager will be disabled by default.

We need to explicitly enable it when deploying on oci platform.

https://github.com/openshift/assisted-service/pull/5548

Epic MGMT-16167: Assisted-installer: support booting from iSCSI in 4.15 for OCI

View the Description

Feature goal (what are we trying to solve here?)

During 4.15, the OCP team is working on allowing booting from iscsi. Today that's disabled by the assisted installer. The goal is to enable that for ocp version >= 4.15 when using OCI external platform.

DoD (Definition of Done)

iscsi boot is enabled for ocp version >= 4.15 both in the UI and the backend.

When booting from iscsi, we need to make sure to add the `rd.iscsi.firmware=1 ip=ibft` kargs during install to enable iSCSI booting.

Does it need documentation support?

yes

Feature origin (who asked for this feature?)

A Customer asked for it

- Oracle

Reasoning (why it’s important?)

In OCI there are bare metal instances with iscsi support and we want to allow customers to use it{}

Task MGMT-16320: Assisted agent should connect to iBFT iSCSI targets

View the Description View the linked PRs

When the Assisted agent boots, it should connect to iBFT iSCSI targets

https://github.com/openshift/assisted-service/pull/5753

Task MGMT-16273: Allow boot from iSCSI in assisted-service for OCI

View the Description View the linked PRs

Assisted Service should allow booting from iSCSI for x86_64 OpenShift versions at least 4.15.0.

Multipath is not supported at this time.

https://github.com/openshift/assisted-service/pull/5728

Feature OCPSTRAT-1282: Enhanced MCO State Reporting GA

View the Description

Feature Overview (aka. Goal Summary)

The MCO should properly report its state in a way that's consistent and able to be understood by customers, troubleshooters, and maintainers alike.

Some customer cases have revealed scenarios where the MCO state reporting is misleading and therefore could be unreliable to base decisions and automation on.

In addition to correcting some incorrect states, the MCO will be enhanced for a more granular view of update rollouts across machines.

Epic MCO-836: state reporting GA

View the Description

stub

Bug OCPBUGS-5452: Taking much time to update node count for MCP

View the Description View the linked PRs

Description of problem:

MCO taking too much time to update the node count for MCP when removing labels from node which MCP uses to match with nodes

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1. Remove `node-role.kubernetes.io/worker=` label from any worker node.
~~~
# oc label node worker-0.sharedocp4upi411ovn.lab.upshift.rdu2.redhat.com node-role.kubernetes.io/worker-
~~~
2. Check MCP worker for correct node count.
~~~
# oc get mcp  worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-6916abae250ad092875791f8297c13e1   True      False      False      3              3                   3                     0                      5d7h
~~~
3. Check after 10-15 mins
~~~
# oc get mcp  worker NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE worker   rendered-worker-6916abae250ad092875791f8297c13e1   True      False      False      2              2                   2                     0                      5d7h
~~~

Actual results:

It took 10-15 mins for MCP to detect node removal.

Expected results:

It should detect node removal as soon as the appropriate label from the node gets missing.

Additional info:

https://github.com/openshift/machine-config-operator/pull/4097

Feature OCPSTRAT-1286: [Tech Preview] Cluster API Provider for vSphere

View the Description

Feature Overview

Move to using the upstream Cluster API (CAPI) in place of the current implementation of the Machine API for standalone Openshift

prerequisite work Goals completed in ~~OCPSTRAT-1122~~
{}Complete the design of the Cluster API (CAPI) architecture and build the core operator logic needed for Phase-1, incorporating the assets from different repositories to simplify asset management.

Phase 1 & 2 covers implementing base functionality for CAPI.

Background, and strategic fit

Initially CAPI did not meet the requirements for cluster/machine management that OCP had the project has moved on, and CAPI is a better fit now and also has better community involvement.
CAPI has much better community interaction than MAPI.
Other projects are considering using CAPI and it would be cleaner to have one solution
Long term it will allow us to add new features more easily in one place vs. doing this in multiple places.

Acceptance Criteria

There must be no negative effect to customers/users of the MAPI, this API must continue to be accessible to them though how it is implemented "under the covers" and if that implementation leverages CAPI is open

Epic OCPCLOUD-1841: CAPI providers: vSphere Tech Preview

View the Description

sets up CAPI ecosystem for vSphere

Spike OCPCLOUD-1609: Investigate what is required to run CAPI vSphere MachineSets

View the Description View the linked PRs

So far we haven't tested this provider at all. We have to run it and spot if there are any issues with it.

Steps:

Try to run the vSphere provider on OCP
Try to create MachineSets
Try various features out and note down bugs
Create stories for resolving issues up stream and downstream

Outcome:

Create stories in epic of items for vSphere that need to be resolved

Feature OCPSTRAT-139: Networking dashboards (non-flows)

View the Description

Elaborate more dashboards (monitoring dashboards, accessible from menu Observe > Dashboards ; admin perspective) related to networking.

Start with just a couple of areas:

Host network dashboard (using node-exporter network / netstat metrics - related to CMO)
OVN/OVS health dashboard (using ovn/ovs metrics)
Ingress dashboard (routes, shards stats) related to Ingress operator / netedge team
(- DNS dashboard, if time)

More info/discussion in this work doc: https://docs.google.com/document/d/1ByNIJiOzd6w5csFYpC27NdOydnBg8Tx45uL4-7v-aCM/edit

Epic NETOBSERV-991: Networking dashboards: OVN / host

View the Description

Elaborate more dashboards (monitoring dashboards, accessible from menu Observe > Dashboards ; admin perspective) related to networking.

Start with just a couple of areas:

Host network dashboard (using node-exporter network / netstat metrics - related to CMO)
OVN/OVS health dashboard (using ovn/ovs metrics)

More info/discussion in this work doc: https://docs.google.com/document/d/1ByNIJiOzd6w5csFYpC27NdOydnBg8Tx45uL4-7v-aCM/edit

Martin Kennelly is our contact point from the SDN team

Story NETOBSERV-1047: Cluster network stats & health dashboard (OVN+host)

View the Description View the linked PRs

Create a dashboard from the CNO

cf https://docs.google.com/document/d/1ByNIJiOzd6w5csFYpC27NdOydnBg8Tx45uL4-7v-aCM/edit#heading=h.as5l4d8fepgw

Current metrics documentation:

Include metrics for:

pod/svc/netpol setup latency
ovs/ovn CPU and memory
network stats: rx/tx bytes, drops, errs per interface (not all interfaces are monitored by default, but they're going to be more configurable via another task: ~~NETOBSERV-1021~~)

https://github.com/openshift/cluster-network-operator/pull/1871

Feature OCPSTRAT-145: Perf/Scale improvements and testing of OVN-Kubernetes

View the Description

Continue scale testing and performance improvements for ovn-kubernetes

Epic SDN-4152: [4.15] Perf-scale improvements and testing of OVN-Kubernetes

View the Description

Template:

Networking Definition of Planned

Epic Template descriptions and documentation

Epic Goal

Manage Openshift Virtual Machines IP addresses from within the SDN solution provided by OVN-Kubernetes.

Why is this important?

Customers want to offload IPAM from their custom solutions (e.g. custom DHCP server running on their cluster network) to SDN.

Planning Done Checklist

The following items must be completed on the Epic prior to moving the Epic from Planning to the ToDo status

Priority+ is set by engineering
Epic must be Linked to a +Parent Feature
Target version+ must be set
Assignee+ must be set
(Enhancement Proposal is Implementable
(No outstanding questions about major work breakdown
(Are all Stakeholders known? Have they all been notified about this item?
Does this epic affect SD? {}Have they been notified{+}? (View plan definition for current suggested assignee)
1. Please use the “Discussion Needed: Service Delivery Architecture Overview” checkbox to facilitate the conversation with SD Architects. The SD architecture team monitors this checkbox which should then spur the conversation between SD and epic stakeholders. Once the conversation has occurred, uncheck the “Discussion Needed: Service Delivery Architecture Overview” checkbox and record the outcome of the discussion in the epic description here.
2. The guidance here is that unless it is very clear that your epic doesn’t have any managed services impact, default to use the Discussion Needed checkbox to facilitate that conversation.

Additional information on each of the above items can be found here: Networking Definition of Planned

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement
details and documents.

...

Dependencies (internal and external)

...

Previous Work (Optional):

1. …

Open questions::

1. …

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story SDN-4173: egressFirewall: decrease the number of OVS flows per node

View the Description View the linked PRs

using source port group instead of address set will decrease the number of ovs flows per node.
Needs to be backported to 4.14

https://github.com/openshift/ovn-kubernetes/pull/1952

Story SDN-4189: Create a Documentation/KCS that outlines the specific issues SD could run into with NetworkPolicies

View the Description View the linked PRs

Theme: Ensure 4.12 SD is as stable as 4.13 SD. See what all are present in 4.14/4.13 that are missing in 4.12 from OVNK pov

We need to come up with a KCS article for 4.12/4.13 around network policies issues. Some things it should cover are:

extensive list of what could go wrong like using except blocks and port ranges in networkpolicies
sample network policy yamls that showcase these patterns, talk about the OVN/OVS level flow explosion
how to detect issues via alerts -> acl counts? ovs cpu?
are there any other ways to express the same policy better as workaround?

Check the existing network policies used by SD MCs and review them to see they are efficient

Talk about how the new OVN 23.06 will fix the except block issue and if we need to backport those port range fixes then yes that too

Verify the fix works?

Goal: End result should be a document and backports if needed outside of the OVN bump planned as part of https://issues.redhat.com/browse/OCPBUGS-22091

See https://issues.redhat.com/browse/OCPBUGS-22091?focusedId=23320502&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-23320502 for details.

Feature OCPSTRAT-178: Allow adding ipv6 VIPs to existing dual stack clusters via Mutable Infrastructure

View the Description

Goal:
Support enablement of dual-stack VIPs on existing clusters created as dual-stack but at a time when it was not possible to have both v4 and v6 VIPs at the same time.

Why is this important?
This is a followup to SDN-2213 ("Support dual ipv4 and ipv6 ingress and api VIPs").

We expect that customers with existing dual stack clusters will want to make use of the new dual stack VIPs fixes/enablement, but it's unclear how this will work because we've never supported modifying on-prem networking configuration after initial deployment. Once we have dual stack VIPs enabled, we will need to investigate how to alter the configuration to add VIPs to an existing cluster.

We will need to make changes to the VIP fields in the Infrastructure and/or ControllerConfig objects. Infrastructure would be the first option since that would make all of the fields consistent, but that relies on the ability to change that object and have the changes persist and be propagated to the ControllerConfig. If that's not possible, we may need to make changes just in ControllerConfig.

Epic OPNET-340: Mechanism to change Infrastructure values

View the Description

For epics https://issues.redhat.com/browse/OPNET-14 and https://issues.redhat.com/browse/OPNET-80 we need a mechanism to change configuration values related to our static pods. Today that is not possible because all of the values are put in the status field of the Infrastructure object.

We had previously discussed this as part of https://issues.redhat.com/browse/OPNET-21 because there was speculation that people would want to move from internal LB to external, which would require mutating a value in Infrastructure. In fact, there was a proposal to put that value in the spec directly and skip the status field entirely, but that was discarded because a migration would be needed in that case and we need separate fields to indicate what was requested and what the current state actually is.

There was some followup discussion about that with Joel Speed from the API team (which unfortunately I have not been able to find a record of yet) where it was concluded that if/when we want to modify Infrastructure values we would add them to the Infrastructure spec and when a value was changed it would trigger a reconfiguration of the affected services, after which the status would be updated.

This means we will need new logic in MCO to look at the spec field (currently there are only fields in the status, so spec is ignored completely) and determine the correct behavior when they do not match. This will mean the values in ControllerConfig will not always match those in Infrastructure.Status. That's about as far as the design has gone so far, but we should keep the three use cases we know of (internal/external LB, VIP addition, and DNS record overrides) in mind as we design the underlying functionality to allow mutation of Infrastructure status values.

Depending on how the design works out, we may only track the design phase in this epic and do the implementation as part of one of the other epics. If there is common logic that is needed by all and can be implemented independently we could do that under this epic though.

Story OPNET-415: Revert the API revert for 4.16

View the Description View the linked PRs

Tasks to do here

Revert the revert of API extension
Revendor API in client-go
Revendor API in installer

https://github.com/openshift/installer/pull/7869

Feature OCPSTRAT-184: Agent Based Installer for IBM Power and zSystems

View the Description

Feature Overview (aka. Goal Summary)

The Agent Based installer is a clean and simple way to install new instances of OpenShift in disconnected environments, guiding the user through the questions and information needed to successfully install an OpenShift cluster. We need to bring this highly useful feature to the IBM Power and IBM zSystem architectures

Goals (aka. expected user outcomes)

Agent based installer on Power and zSystems should reflect what is available for x86 today.

Requirements (aka. Acceptance Criteria):

Able to use the agent based installer to create OpenShift clusters on Power and zSystem architectures in disconnected environments

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

High-level list of items that are out of scope. Initial completion during Refinement status.

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs. Initial completion during Refinement status.

Interoperability Considerations

Which other projects and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.

Epic MULTIARCH-2674: Agent Based Installer for P/Z (no ISO support for Z)

View the Description

Epic Goal

The goal of this Epic is to enable Agent Based Installer for P/Z

Why is this important?

The Agent Based installer is a research Spike item for the Multi-Arch team during the 4.12 release and later

Scenarios
1. …

Acceptance Criteria

See "Definition of Done" below

Dependencies (internal and external)
1. …

Previous Work (Optional):
1. …

Open questions::
1. …

Done Checklist

CI - For new features (non-enablement), existing Multi-Arch CI jobs are not broken by the Epic
Release Enablement: <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR orf GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - If the Epic is adding a new stream, downstream build attached to advisory: <link to errata>
QE - Test plans in Test Plan tracking software (e.g. Polarion, RQM, etc.): <link or reference to the Test Plan>
QE - Automated tests merged: <link or reference to automated tests>
QE - QE to verify documentation when testing
DOC - Downstream documentation merged: <link to meaningful PR>
All the stories, tasks, sub-tasks and bugs that belong to this epic need to have been completed and indicated by a status of 'Done'.

Task MULTIARCH-3701: enable ppc64le for agent installer

View the Description View the linked PRs

Enable openshift-install to create agent based install ISO for power.

https://github.com/openshift/installer/pull/7366

Story MULTIARCH-2678: Build an environment and perform engineering validation s390x

View the Description View the linked PRs

As the multi-arch engineer, I would like to build an environment and deploy using Agent Based installer, so that I can confirm if the feature works per spec.

Entrance Criteria

(If there is research) research completed and proven that the feature could be done

Acceptance Criteria

“Proof” of verification (Logs, etc.)
If independent test code written, a link to the code added to the JIRA story

https://github.com/openshift/installer/pull/7712

Feature OCPSTRAT-193: Automatically restart storage operators pods when the CA certificates are updated

View the Description

Feature Overview (aka. Goal Summary)

The storage operators need to be automatically restarted after the certificates are renewed.

From OCP doc "The service CA certificate, which issues the service certificates, is valid for 26 months and is automatically rotated when there is less than 13 months validity left."

Since OCP is now offering an 18 months lifecycle per release, the storage operator pods need to be automatically restarted after the certificates are renewed.

Goals (aka. expected user outcomes)

The storage operators will be transparently restarted. The customer benefit should be transparent, it avoids manually restart of the storage operators.

Requirements (aka. Acceptance Criteria):

The administrator should not need to restart the storage operator when certificates are renew.

This should apply to all relevant operators with a consistent experience.

Use Cases (Optional):

As an administrator I want the storage operators to be automatically restarted when certificates are renewed.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

High-level list of items that are out of scope. Initial completion during Refinement status.

Background

This feature request is triggered by the new extended OCP lifecycle. We are moving from 12 to 18 months support per release.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

No doc is required

Interoperability Considerations

This feature only cover storage but the same behavior should be applied to every relevant components.

Epic STOR-1445: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story STOR-1443: Automatically restart `csi-snapshot-webhook` pods when the secret `csi-snapshot-webhook-secret` is updated

View the Description View the linked PRs

The pod `csi-snapshot-webhook` mounts the secret:
```

$ cat assets/webhook/deployment.yaml
kind: Deployment
metadata:
  name: csi-snapshot-webhook
  ...
spec:
  template:
    spec:
      containers:

        volumeMounts:
          - name: certs
            mountPath: /etc/snapshot-validation-webhook/certs

      volumes:
      - name: certs
        secret:
          secretName: csi-snapshot-webhook-secret

```
Hence, if the secret is updated (e.g. as a result of CA cert update), the Pod must be restarted.

https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/157

Story STOR-1446: Automatically restart `vsphere-problem-detector-operator` pods when the secret `vsphere-problem-detector-serving-cert` is updated

View the Description View the linked PRs

The pod `vsphere-problem-detector-operator` mounts the secret:

$ cat assets/vsphere_problem_detector/07_deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: vsphere-problem-detector-operator

    spec:
      containers:
	  
        volumeMounts:
        - mountPath: /var/run/secrets/serving-cert
          name: vsphere-problem-detector-serving-cert

      volumes:
      - name: vsphere-problem-detector-serving-cert
        secret:
          secretName: vsphere-problem-detector-serving-cert

Hence, if the secret is updated (e.g. as a result of CA cert update), the Pod must be restarted

https://github.com/openshift/cluster-storage-operator/pull/394

Feature OCPSTRAT-243: Custom roles for GCP Workload Identity

View the Description

BU Priority Overview

Create custom roles for GCP with minimal set of required permissions.

Goals

Enable customers to better scope credential permissions and create custom roles on GCP that only include the minimum subset of what is needed for OpenShift.

State of the Business

Some of the service accounts that CCO creates, e.g. service account with role roles/iam.serviceAccountUser provides elevated permissions that are not required/used by the requesting OpenShift components. This is because we use predefined roles for GCP that come with bunch of additional permissions. The goal is to create custom roles with only the required permissions.

Execution Plans

TBD

Epic IR-407: GCP role granularity

View the Description

Epic Goal

Why is this important?

Scenarios

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story IR-408: Update GCP Credentials Request manifest of the Cluster Image Registry Operator to use new API field for requesting permissions

View the Description View the linked PRs

Evaluate if any of the GCP predefined roles in the credentials request manifest of Cluster Image Registry Operator give elevated permissions. Remove any such predefined role from spec.predefinedRoles field and replace it with required permissions in the new spec.permissions field.

The new GCP provider spec for credentials request CR is as follows:

type GCPProviderSpec struct {
   metav1.TypeMeta `json:",inline"`
   // PredefinedRoles is the list of GCP pre-defined roles
   // that the CredentialsRequest requires.
   PredefinedRoles []string `json:"predefinedRoles"`
   // Permissions is the list of GCP permissions required to
   // create a more fine-grained custom role to satisfy the
   // CredentialsRequest.
   // When both Permissions and PredefinedRoles are specified
   // service account will have union of permissions from
   // both the fields
   Permissions []string `json:"permissions"`
   // SkipServiceCheck can be set to true to skip the check whether the requested roles or permissions
   // have the necessary services enabled
   // +optional
   SkipServiceCheck bool `json:"skipServiceCheck,omitempty"`
}

we can use the following command to check permissions associated with a GCP predefined role

gcloud iam roles describe <role_name>

The sample output for role roleViewer is as follows. The permission are listed in "includedPermissions" field.

[akhilrane@localhost cloud-credential-operator]$ gcloud iam roles describe roles/iam.roleViewer
description: Read access to all custom roles in the project.
etag: AA==
includedPermissions:
- iam.roles.get
- iam.roles.list
- resourcemanager.projects.get
- resourcemanager.projects.getIamPolicy
name: roles/iam.roleViewer
stage: GA

https://github.com/openshift/cluster-image-registry-operator/pull/935

Bug OCPBUGS-24684: CIRO should use granular roles on GCP

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-image-registry-operator/pull/972

Epic CCO-285: GCP openshift role granularity enhancement - phase 2

View the Description

These are phase 2 items from ~~CCO-188~~

Moving items from other teams that need to be committed to for 4.13 this work to complete

Epic Goal

Request to build list of specific permissions to run openshift on GCP - Components grant roles, but we need more granularity - Custom roles now allow ability to do this compared to when permissions capabilities were originally written for GCP

Why is this important?

Some of the service accounts that CCO creates, e.g. service account with role roles/iam.serviceAccountUser provides elevated permissions that are not required/used by the requesting OpenShift components. This is because we use predefined roles for GCP that come with bunch of additional permissions. The goal is to create custom roles with only the required permissions.

Story CCO-244: Update GCP Credentials Request manifest of the Cloud Credentials Operator to use new API field for requesting permissions

View the Description View the linked PRs

Evaluate if any of the GCP predefined roles in the credentials request manifest of cloud credentials operator give elevated permissions. Remove any such predefined role from spec.predefinedRoles field and replace it with required permissions in the new spec.permissions field.

The new GCP provider spec for credentials request CR is as follows:

type GCPProviderSpec struct {
   metav1.TypeMeta `json:",inline"`
   // PredefinedRoles is the list of GCP pre-defined roles
   // that the CredentialsRequest requires.
   PredefinedRoles []string `json:"predefinedRoles"`
   // Permissions is the list of GCP permissions required to
   // create a more fine-grained custom role to satisfy the
   // CredentialsRequest.
   // When both Permissions and PredefinedRoles are specified
   // service account will have union of permissions from
   // both the fields
   Permissions []string `json:"permissions"`
   // SkipServiceCheck can be set to true to skip the check whether the requested roles or permissions
   // have the necessary services enabled
   // +optional
   SkipServiceCheck bool `json:"skipServiceCheck,omitempty"`
}

we can use the following command to check permissions associated with a GCP predefined role

gcloud iam roles describe <role_name>

The sample output for role roleViewer is as follows. The permission are listed in "includedPermissions" field.

[akhilrane@localhost cloud-credential-operator]$ gcloud iam roles describe roles/iam.roleViewer
description: Read access to all custom roles in the project.
etag: AA==
includedPermissions:
- iam.roles.get
- iam.roles.list
- resourcemanager.projects.get
- resourcemanager.projects.getIamPolicy
name: roles/iam.roleViewer
stage: GA
title: Role Viewer

https://github.com/openshift/cloud-credential-operator/pull/626

Story CCO-251: Update GCP Credentials Request manifest of the Cluster Storage Operator to use new API field for requesting permissions

View the Description View the linked PRs

Evaluate if any of the GCP predefined roles in the credentials request manifest of Cluster Storage Operator give elevated permissions. Remove any such predefined role from spec.predefinedRoles field and replace it with required permissions in the new spec.permissions field.

The new GCP provider spec for credentials request CR is as follows:

type GCPProviderSpec struct {
   metav1.TypeMeta `json:",inline"`
   // PredefinedRoles is the list of GCP pre-defined roles
   // that the CredentialsRequest requires.
   PredefinedRoles []string `json:"predefinedRoles"`
   // Permissions is the list of GCP permissions required to
   // create a more fine-grained custom role to satisfy the
   // CredentialsRequest.
   // When both Permissions and PredefinedRoles are specified
   // service account will have union of permissions from
   // both the fields
   Permissions []string `json:"permissions"`
   // SkipServiceCheck can be set to true to skip the check whether the requested roles or permissions
   // have the necessary services enabled
   // +optional
   SkipServiceCheck bool `json:"skipServiceCheck,omitempty"`
}

we can use the following command to check permissions associated with a GCP predefined role

gcloud iam roles describe <role_name>

The sample output for role roleViewer is as follows. The permission are listed in "includedPermissions" field.

[akhilrane@localhost cloud-credential-operator]$ gcloud iam roles describe roles/iam.roleViewer
description: Read access to all custom roles in the project.
etag: AA==
includedPermissions:
- iam.roles.get
- iam.roles.list
- resourcemanager.projects.get
- resourcemanager.projects.getIamPolicy
name: roles/iam.roleViewer
stage: GA
title: Role Viewer

https://github.com/openshift/cluster-storage-operator/pull/410

Bug OCPBUGS-28850: Implement per-project custom role creation in ccoctl

View the Description

Rather than create custom roles per-cluster, as is currently implemented for GCP, ccoctl should create custom roles per-project due to custom role deletion policies. When a custom role is deleted in GCP it continues to exist and contributes to quota for 7 days. Custom roles are not permanently deleted for up to 14 days after deletion ref: https://cloud.google.com/iam/docs/creating-custom-roles#deleting-custom-role.

Deletion should ignore these per-project custom roles by default and provide an optional flag to delete them.

Since the custom roles must be created per-project, deltas in permissions must be additive. We can't remove permissions with these restrictions since previous versions may rely on those custom role permissions.

Post a warning/info message regarding the permission delta so that users are aware that there are extra permissions and they can clean them up possibly if they're sure they aren't being utilized.

Epic OCPCLOUD-1718: Update GCP Credentials Request manifests of the OpenShift components to use new API field for requesting permissions

View the Description

Evaluate if any of the GCP predefined roles in the credentials request manifests of OpenShift cluster operators give elevated permissions. Remove any such predefined role from spec.predefinedRoles field and replace it with required permissions in the new spec.permissions field.

The new GCP provider spec for credentials request CR is as follows:

type GCPProviderSpec struct {
   metav1.TypeMeta `json:",inline"`
   // PredefinedRoles is the list of GCP pre-defined roles
   // that the CredentialsRequest requires.
   PredefinedRoles []string `json:"predefinedRoles"`
   // Permissions is the list of GCP permissions required to
   // create a more fine-grained custom role to satisfy the
   // CredentialsRequest.
   // When both Permissions and PredefinedRoles are specified
   // service account will have union of permissions from
   // both the fields
   Permissions []string `json:"permissions"`
   // SkipServiceCheck can be set to true to skip the check whether the requested roles or permissions
   // have the necessary services enabled
   // +optional
   SkipServiceCheck bool `json:"skipServiceCheck,omitempty"`
}

we can use the following command to check permissions associated with a GCP predefined role

gcloud iam roles describe <role_name>

The sample output for role roleViewer is as follows. The permission are listed in "includedPermissions" field.

[akhilrane@localhost cloud-credential-operator]$ gcloud iam roles describe roles/iam.roleViewer
description: Read access to all custom roles in the project.
etag: AA==
includedPermissions:
- iam.roles.get
- iam.roles.list
- resourcemanager.projects.get
- resourcemanager.projects.getIamPolicy
name: roles/iam.roleViewer
stage: GA
title: Role Viewer

Story OCPCLOUD-1724: Update GCP Credentials Request manifest of the Cloud Controller Manager Operator to use new API field for requesting permissions

View the Description View the linked PRs

Evaluate if any of the GCP predefined roles in the credentials request manifest of Cloud Controller Manager Operator give elevated permissions. Remove any such predefined role from spec.predefinedRoles field and replace it with required permissions in the new spec.permissions field.

The new GCP provider spec for credentials request CR is as follows:

type GCPProviderSpec struct {
   metav1.TypeMeta `json:",inline"`
   // PredefinedRoles is the list of GCP pre-defined roles
   // that the CredentialsRequest requires.
   PredefinedRoles []string `json:"predefinedRoles"`
   // Permissions is the list of GCP permissions required to
   // create a more fine-grained custom role to satisfy the
   // CredentialsRequest.
   // When both Permissions and PredefinedRoles are specified
   // service account will have union of permissions from
   // both the fields
   Permissions []string `json:"permissions"`
   // SkipServiceCheck can be set to true to skip the check whether the requested roles or permissions
   // have the necessary services enabled
   // +optional
   SkipServiceCheck bool `json:"skipServiceCheck,omitempty"`
}

we can use the following command to check permissions associated with a GCP predefined role

gcloud iam roles describe <role_name>

The sample output for role roleViewer is as follows. The permission are listed in "includedPermissions" field.

[akhilrane@localhost cloud-credential-operator]$ gcloud iam roles describe roles/iam.roleViewer
description: Read access to all custom roles in the project.
etag: AA==
includedPermissions:
- iam.roles.get
- iam.roles.list
- resourcemanager.projects.get
- resourcemanager.projects.getIamPolicy
name: roles/iam.roleViewer
stage: GA
title: Role Viewer

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/320

Story OCPCLOUD-1726: Update GCP Credentials Request manifest of the Cluster CAPI Operator to use new API field for requesting permissions

View the Description View the linked PRs

Evaluate if any of the GCP predefined roles in the credentials request manifest of Cluster CAPI Operator give elevated permissions. Remove any such predefined role from spec.predefinedRoles field and replace it with required permissions in the new spec.permissions field.

The new GCP provider spec for credentials request CR is as follows:

type GCPProviderSpec struct {
   metav1.TypeMeta `json:",inline"`
   // PredefinedRoles is the list of GCP pre-defined roles
   // that the CredentialsRequest requires.
   PredefinedRoles []string `json:"predefinedRoles"`
   // Permissions is the list of GCP permissions required to
   // create a more fine-grained custom role to satisfy the
   // CredentialsRequest.
   // When both Permissions and PredefinedRoles are specified
   // service account will have union of permissions from
   // both the fields
   Permissions []string `json:"permissions"`
   // SkipServiceCheck can be set to true to skip the check whether the requested roles or permissions
   // have the necessary services enabled
   // +optional
   SkipServiceCheck bool `json:"skipServiceCheck,omitempty"`
}

we can use the following command to check permissions associated with a GCP predefined role

gcloud iam roles describe <role_name>

The sample output for role roleViewer is as follows. The permission are listed in "includedPermissions" field.

[akhilrane@localhost cloud-credential-operator]$ gcloud iam roles describe roles/iam.roleViewer
description: Read access to all custom roles in the project.
etag: AA==
includedPermissions:
- iam.roles.get
- iam.roles.list
- resourcemanager.projects.get
- resourcemanager.projects.getIamPolicy
name: roles/iam.roleViewer
stage: GA
title: Role Viewer

https://github.com/openshift/cluster-capi-operator/pull/155

Story OCPCLOUD-1725: Update GCP Credentials Request manifest of the Machine API Operator to use new API field for requesting permissions

View the Description View the linked PRs

Evaluate if any of the GCP predefined roles in the credentials request manifest of machine api operator give elevated permissions. Remove any such predefined role from spec.predefinedRoles field and replace it with required permissions in the new spec.permissions field.

The new GCP provider spec for credentials request CR is as follows:

type GCPProviderSpec struct {
   metav1.TypeMeta `json:",inline"`
   // PredefinedRoles is the list of GCP pre-defined roles
   // that the CredentialsRequest requires.
   PredefinedRoles []string `json:"predefinedRoles"`
   // Permissions is the list of GCP permissions required to
   // create a more fine-grained custom role to satisfy the
   // CredentialsRequest.
   // When both Permissions and PredefinedRoles are specified
   // service account will have union of permissions from
   // both the fields
   Permissions []string `json:"permissions"`
   // SkipServiceCheck can be set to true to skip the check whether the requested roles or permissions
   // have the necessary services enabled
   // +optional
   SkipServiceCheck bool `json:"skipServiceCheck,omitempty"`
}

we can use the following command to check permissions associated with a GCP predefined role

gcloud iam roles describe <role_name>

The sample output for role roleViewer is as follows. The permission are listed in "includedPermissions" field.

[akhilrane@localhost cloud-credential-operator]$ gcloud iam roles describe roles/iam.roleViewer
description: Read access to all custom roles in the project.
etag: AA==
includedPermissions:
- iam.roles.get
- iam.roles.list
- resourcemanager.projects.get
- resourcemanager.projects.getIamPolicy
name: roles/iam.roleViewer
stage: GA
title: Role Viewer

https://github.com/openshift/machine-api-operator/pull/1196

Feature OCPSTRAT-257: Nutanix Zonal: Multiple regions and zones support for Nutanix IPI and UPI

View the Description

Feature Overview

As an Infrastructure Administrator, I want to deploy OpenShift on Nutanix distributing the control plane and compute nodes across multiple regions and zones, forming different failure domains.

As an Infrastructure Administrator, I want to configure an existing OpenShift cluster to distribute the nodes across regions and zones, forming different failure domains.

Goals

Install OpenShift on Nutanix using IPI / UPI in multiple regions and zones.

Requirements (aka. Acceptance Criteria):

Ensure Nutanix IPI can successfully be deployed with ODF across multiple zones (like we do with vSphere, AWS, GCP & Azure)
Ensure zonal configuration in Nutanix using UPI is documented and tested

vSphere Implementation

This implementation would follow the same idea that has been done for vSphere. The following are the main PRs for vSphere:

https://github.com/openshift/enhancements/blob/master/enhancements/installer/vsphere-ipi-zonal.md

OpenShift 4.12

OpenShift 4.13
https://github.com/openshift/installer/pull/6770 https://github.com/openshift/installer/pull/6782 https://github.com/openshift/installer/pull/6750 https://github.com/openshift/installer/pull/6738 https://github.com/openshift/installer/pull/6612 https://github.com/openshift/installer/pull/6327 https://github.com/openshift/api/pull/1388 https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/224 https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/218 https://github.com/openshift/openshift-docs/pull/54788 https://github.com/openshift/installer/pull/6905

OpenShift 4.13

Existing vSphere documentation

https://docs.openshift.com/container-platform/4.13/installing/installing_vsphere/installing-vsphere-installer-provisioned-customizations.html#configuring-vsphere-regions-zones_installing-vsphere-installer-provisioned-customizations

https://docs.openshift.com/container-platform/4.13/post_installation_configuration/post-install-vsphere-zones-regions-configuration.html

Epic CORS-2728: Nutanix Zonal: Multiple regions and zones support for Nutanix IPI and Assisted Installer (PRs Reviews)

View the Description

Epic Goal

Nutanix Zonal: Multiple regions and zones support for Nutanix IPI and Assisted Installer

Note

Nutanix Engineering team is driving this implementation based on the vSphere zonal implementation, led by Yanhua Li .
The Installer team is expected to just review PRs.
PRs are expected in these repos:

Spike SPLAT-1272: Nutanix Zonal: Multiple zones support for Nutanix IPI and UPI

View the Description View the linked PRs

As a user, I want to be able to spread control plane nodes for an OCP clusters across Prism Elements (zones).

Feature OCPSTRAT-330: [Upstream] OpenShift AutoScaler (Phase 3)

View the Description

Feature Overview
This is a TechDebt and doesn't impact OpenShift Users.
As the autoscaler has become a key feature of OpenShift, there is the requirement to continue to expand it's use bringing all the features to all the cloud platforms and contributing to the community upstream. This feature is to track the initiatives associated with the Autoscaler in OpenShift.

Goals

Scale from zero available on all cloud providers (where available)
Required upstream work
Work needed as a result of rebase to new kubernetes version

Requirements

Requirement	Notes	isMvp?
vSphere autoscaling from zero		No
Upstream E2E testing		No
Upstream adapt scale from zero replicas		No

Out of Scope

n/a

Background, and strategic fit
Autoscaling is a key benefit of the Machine API and should be made available on all providers

Assumptions

Customer Considerations

Documentation Considerations

Target audience: cluster admins
Updated content: update docs to mention any change to where the features are available.

Epic OCPCLOUD-2136: Update autoscaling annotations to accommodate upstream keys

View the Description

Epic Goal

Update the scale from zero autoscaling annotations on MachineSets to conform with the upstream keys, while also continuing to accept the openshift specific keys that we have been using.

Why is this important?

This change makes our implementation of the cluster autoscaler conform to the API that is described in the upstream community. This reduces the mental overhead for someone that knows kubernetes but is new to openshift.
This change also reduces the maintenance burden that we carry in the form of addition patches to the cluster autoscaler. By changing our controllers to understand the upstream annotations we are able to remove extra patches on our fork of the cluster autoscaler, making future maintenance easier and closer to the upstream source.

Scenarios

A user is debugging a cluster autoscaler issue by examining the related MachineSet objects, they see the scale from zero annotations and recognize them from the project documentation and from upstream discussions. The result is that the user is more easily able to find common issues and advice from the upstream community.
An openshift maintainer is updating the cluster autoscaler for a new version of kubernetes, because the openshift controllers understand the upstream annotations, the maintainer does not need to carry or modify a patch to support multiple varieties of annotation. This in turn makes the task of updating the autoscaler simpler and reduces burden on the maintainer.

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
Scale from zero autoscaling must continue to work with both the old openshift annotations and the newer upstream annotations.

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - OpenShift code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - OpenShift documentation merged: <link to meaningful PR or GitHub Issue>
DEV - OpenShift build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - OpenShift documentation merged: <link to meaningful PR>

please note, the changes described by this epic will happen in OpenShift controllers and as such there is no "upstream" relationship in the same sense as the Kubernetes-based controllers.

Story OCPCLOUD-2138: Create scale from zero annotations util module in MAO

View the Description View the linked PRs

User Story

As a developer I want to have a consistent way to apply the scale from zero annotations so that it is easier to update the various provider machineset actuators. Having a utility module in the MAO will make this easier by providing a single place for all the MachineSet actuators to share.

Background

Currently the individual provider MachineSet actuators each contain string variables and independent implementations of the scale from zero annotations. This configuration is more brittle than having a central module which could be utilized by all the providers.

Steps

Create a scale from zero utility module in MAO that contains both the upstream and downstream scale from zero annotations.
Add convenience functions for checking, adding, and updating the MachineSet annotations.
Add unit tests for any functions with conditional logic

Stakeholders

openshift cloud team

Definition of Done

Module exists in MAO with tests and can be imported by providers

Docs

developer docs should be updated in MAO

Testing

Most of the functions in this chage should be getter/setter types, but if conditional functions are introduced then we should have unit tests to exercise them.

https://github.com/openshift/machine-api-operator/pull/1169

Story OCPCLOUD-2137: Update CAO to recognize scale from zero annotations

View the Description View the linked PRs

User Story

As a user I want to ensure that scale from zero cluster autoscaling works well when using the upstream scaling hint annotations so that I can follow the community best practices. Having the cluster autoscaler operator monitor the scale from zero annotations, and correct them when incorrect, will confirm the correct behavior.

Background

As part of migrating the OpenShift scale from zero annotations to use the upstream annotations keys, the cluster autoscaler operator should be updated to look for these annotations on MachineSets that it is monitoring.

Currently, we use annotations with prefix "machinie.openshift.io", in the upstream the prefix is "capacity.cluster-autoscaler.kubernetes.io". The CAO should be updated to recognize when a MachineSet has either set of annotations, and then ensure that both sets exist.

Adding both sets of annotations will help us during the transition to using the upstream set, and will also ensure backward compatibility with our published API.

Please note that care must be taken with the suffixes as well. Some of the OpenShift suffixes are different from upstream, and in specific the memory suffix uses a different type of calculation. As we convert our autoscaler implementation to use the upstream annotations we must make sure that any conversions will conform to upstream.

<Describes the context or background related to this story>

Steps

Add ability to check annotations on MachineSets
Add conversion/update to scale from zero annotations
Add unit tests to confirm the proper behavior

Stakeholders

openshift cloud team

Definition of Done

CAO is properly adding both sets of scale from zero annotations

Docs

developer docs in the CAO repo will need updating
if there are product docs around the CAO, they should be updated as well

Testing

the CAO needs full unit testing to confirm the behavior described above

https://github.com/openshift/cluster-autoscaler-operator/pull/294

Feature OCPSTRAT-360: IPI on PowerVS (GA)

View the Description

Epic Goal

Feature Overview (aka. Goal Summary)

The goal of this initiative to help boost adoption of OpenShift on ppc64le. This can be further broken down into several key objectives.

For IBM, furthering adopt of OpenShift will continue to drive adoption on their power hardware. In parallel, this can be used for existing customers to migrate their old power on-prem workloads to a cloud environment.
For the Multi-Arch team, this represents our first opportunity to develop an IPI offering on one of the IBM platforms. Right now, we depend on IPI on libvirt to cover our CI needs; however, this is not a supported platform for customers. PowerVS would address this caveat for ppc64le.
By bringing in PowerVS, we can provide customers with the easiest possible experience to deploy and test workloads on IBM architectures.
Customers already have UPI methods to solve their OpenShift on prem needs for ppc64le. This gives them an opportunity for a cloud based option, further our hybrid-cloud story.

Goals (aka. expected user outcomes)

The goal of this epic to begin the process of expanding support of OpenShift on ppc64le hardware to include IPI deployments against the IBM Power Virtual Server (PowerVS) APIs.

Requirements (aka. Acceptance Criteria):

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

High-level list of items that are out of scope. Initial completion during Refinement status.

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs. Initial completion during Refinement status.

Interoperability Considerations

Epic MULTIARCH-3438: Enhancements to IPI for [Tech Preview] Power VS (IBM Public Cloud) (OCP 4.14)

View the Description

Epic Goal

Improve IPI on Power VS in the 4.14 cycle
- Changes to the installer to handle edge cases, fix bugs, and improve usability.
- No major changes are anticipated this cycle.

Running doc to describe terminologies and concepts which are specific to Power VS - https://docs.google.com/document/d/1Kgezv21VsixDyYcbfvxZxKNwszRK6GYKBiTTpEUubqw/edit?usp=sharing

Task MULTIARCH-3761: Update cluster-api-ibmcloud to avoid using deprecated flag

View the Description View the linked PRs

Flag powervs-provider-id-fmt is being deprecated and removed in upstream via PR: https://github.com/kubernetes-sigs/cluster-api-provider-ibmcloud/pull/1404.

Need to make necessary changes to use flag provider-id-fmt.

https://github.com/openshift/cluster-capi-operator/pull/128

Feature OCPSTRAT-427: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Epic NP-41: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View Demos

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story NP-615: Enhance Cluster Network Operator for SDN live-migration

View the Description View the linked PRs

We need to enhance cluster network operator to automate the whole SDN live-migration.

New API will be introduced to CNO to facilitate the migration
CNO shall be able to deploy OVN-K and SDN in 'migration mode'
CNO shall be able to annotate nodes to bypassing the IP allocation of OVN-K, when MCO is updating the MachineConfig of nodes
CNO shall be able to redeploy the OVN-K and SDN in 'regular mode' after the migration is done.

Story NP-793: Add new API for SDN live migration

View the linked PRs

https://github.com/openshift/cluster-config-operator/pull/375

Story NP-618: [OVN-K] Allow nodes to switch role between hybrid overlay node and OVN-K node in OVN-K dynamically

View the Description View the linked PRs

During the migration, a node will start as an SDN node (a hybrid overlay node from OVN-K perspective), then become an OVN-K node. So OVN-K needs to support such dynamical role switching.

https://github.com/openshift/ovn-kubernetes/pull/1965

Feature OCPSTRAT-460: Optimized HyperShift Operator Deployment on AKS and Adaptive Environment Detection

View the Description

Goal

This goals of this features are:

optimize and streamline the operations of HyperShift Operator (HO) on Azure Kubernetes Service (AKS) clusters
Enable auto-detectopm of the underlying environment (managed or self-managed) to optimize the HO accordingly.

Epic HOSTEDCP-922: Azure ARO support

View the Description

Place holder epic to capture all azure tickets.

TODO: review.

Bug OCPBUGS-19957: Cloud Network Config Controller Pod Not Initializing on Azure HCP

View the Description View the linked PRs

Description of problem:

The cloud network config controller never initializes on Azure HostedClusters. This behaviors exists on both Arm and x86 Azure mgmt clusters.

Version-Release number of selected component (if applicable):

How reproducible:

Every time

Steps to Reproduce:

1. Create either an Arm or x86 Azure mgmt cluster
2. Install HO
3. Create a HostedCluster
4. Observe CNCC pod doesn't initialize

Actual results:

CNCC pod doesn't initialize

Expected results:

CNCC pod initializes

Additional info:

It looks like a secret isn't being reconciled to the CPO

% oc describe pod/cloud-network-config-controller-96567b45f-7jkl5 -n clusters-brcox-hypershift-arm
Name:                 cloud-network-config-controller-96567b45f-7jkl5
...
Events:
  Type     Reason       Age                   From               Message
  ----     ------       ----                  ----               -------
  Normal   Scheduled    5m46s                 default-scheduler  Successfully assigned clusters-brcox-hypershift-arm/cloud-network-config-controller-96567b45f-7jkl5 to ci-ln-vmb5w8k-1d09d-jr6m6-worker-centralus3-45dg5
  Warning  FailedMount  101s (x2 over 3m44s)  kubelet            Unable to attach or mount volumes: unmounted volumes=[cloud-provider-secret], unattached volumes=[], failed to process volumes=[]: timed out waiting for the condition
  Warning  FailedMount  97s (x10 over 5m47s)  kubelet            MountVolume.SetUp failed for volume "cloud-provider-secret" : secret "cloud-network-config-controller-creds" not found

https://github.com/openshift/hypershift/pull/3065

Feature OCPSTRAT-487: Pod Security Admission Integration - Restricted Enforcement

View the Description

Upstream K8s deprecated PodSecurityPolicy and replaced it with a new built-in admission controller that enforces the Pod Security Standards (See here for the motivations for deprecation).] There is an OpenShift-specific dedicated pod admission system called Security Context Constraints. Our aim is to keep the Security Context Constraints pod admission system while also allowing users to have access to the Kubernetes Pod Security Admission.

With OpenShift 4.11, we are turned on the Pod Security Admission with global "privileged" enforcement. Additionally we set the "restricted" profile for warnings and audit. This configuration made it possible for users to opt-in their namespaces to Pod Security Admission with the per-namespace labels. We also introduced a new mechanism that automatically synchronizes the Pod Security Admission "warn" and "audit" labels.

With OpenShift 4.15, we intend to move the global configuration to enforce the "restricted" pod security profile globally. With this change, the label synchronization mechanism will also switch into a mode where it synchronizes the "enforce" Pod Security Admission label rather than the "audit" and "warn".

Epic AUTH-262: Pod Security Admission Integration - Restricted Enforcement

View the Description

Epic Goal

Get Pod Security admission to be run in "restricted" mode globally by default alongside with SCC admission.

Story AUTH-442: Cluster Fleet Evaluation: Create Controller

View the Description View the linked PRs

What

Create a controller in cluster-kube-apiserver-operator that checks for pod security violations and sets the ClusterStatusConditionTypes.

Why

This will create monitoring alerts with clusters that would fail on enforced PSa.

This will help us to make a decision for enforcement on v4.15.

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1588

Story AUTH-409: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Feature OCPSTRAT-518: Console: Customer Happiness (RFEs) for 4.15

View the Description

Feature Overview

Console enhancements based on customer RFEs that improve customer user experience.

Goals

This Section:* Provide high-level goal statement, providing user context and expected user outcome(s) for this feature

Requirements

This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.

Requirement	Notes	isMvp?

CI - MUST be running successfully with test automation

This is a requirement for ALL features.

YES

Release Technical Enablement

Provide necessary release enablement details and documents.

YES

(Optional) Use Cases

This Section:

Main success scenarios - high-level user stories

Alternate flow/scenarios - high-level user stories

Questions to answer…

Out of Scope

Background, and strategic fit

This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.

Assumptions

Customer Considerations

Documentation Considerations

Questions to be addressed:

What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?

Does this feature have doc impact?

New Content, Updates to existing content, Release Note, or No Doc Impact

If unsure and no Technical Writer is available, please contact Content Strategy.

What concepts do customers need to understand to be successful in [action]?

How do we expect customers will use the feature? For what purpose(s)?

What reference material might a customer want/need to complete [action]?

Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.

What is the doc impact (New Content, Updates to existing content, or Release Note)?

Epic CONSOLE-3542: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story CONSOLE-3732: Add option to enable/disable tailing to Pod log viewer

View the Description View the linked PRs

1. Proposed title of this feature request

Add option to enable/disable tailing to log viewer

2. What is the nature and description of the request?

See https://issues.redhat.com/browse/OCPBUGS-362

prior to 4.10, on initial load, the log viewer loaded a maximum of the latest 1000 lines
in 4.10, the 1000 line limit was removed per RFE requests (see https://github.com/openshift/console/pull/10486)
multiple customers have requested the limit be returned, but this will revert the change to remove the limit per other RFE requests
given the number of different configurable options on the log viewer, UX involvement to work through the interaction of adding the tail option is likely necessary

3. Why does the customer need this?

See https://issues.redhat.com/browse/OCPBUGS-362

4. List any affected packages or components.

Management Console

AC: Add functionality for tailing logs based on UX input. This functionality should be done for both Pod logs view. Add integration test.

UX draft - https://docs.google.com/document/u/2/d/1C9lO4JvUesAIn9U5m7Q98Tx4zw77sQGtzCJ4Wh6FAK0/edit?usp=sharing

UX contact - Tal Tobias

https://github.com/openshift/console/pull/13298

Story CONSOLE-3791: readOnlyRootFilesystem should be explicitly to true and if required to false for security reason

View the Description View the linked PRs

According to security best practice, it's recommended to set readOnlyRootFilesystem: true for all containers running on kubernetes. Given that openshift-console does not set that explicitly, it's requested that this is being evaluated and if possible set to readOnlyRootFilesystem: true or otherwise to readOnlyRootFilesystem: false with a potential explanation why the file-system needs to be write-able.

3. Why does the customer need this? (List the business requirements here)
Extensive security audits are run on OpenShift Container Platform 4 and are highlighting that many vendor specific container is missing to set readOnlyRootFilesystem: true or else justify why readOnlyRootFilesystem: false is set.

AC: Set up readOnlyRootFilesystem field on both console and console-operator deployment's spec. Part of the work is to determine the value. True if the pod if not doing any writing to its filesystem, otherwise false.

https://github.com/openshift/console-operator/pull/809

Story CONSOLE-3793: Show node uptime information in the OpenShift Console

View the Description View the linked PRs

Show node uptime information in the Openshift Console.

When the user logs into the OpenShift web console and goes to the Nodes section, it doesn't display the uptime information of each node. Currently, it only shows the date when the node was created.

Customer wants to have additional info related to the time a node is up, i.e. since when the node is up, so it can be useful for tracking node restarts or failures.

https://github.com/openshift/console/pull/13312

Feature OCPSTRAT-523: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Epic NE-463: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story CFE-887: As a developer, I want to vendor openshift/api changes for the new CRD to openshift/cluster-config-operator

View the linked PRs

https://github.com/openshift/cluster-config-operator/pull/364

Feature OCPSTRAT-53: Agent-based Installer Interactive flow

View the Description

Goal

Guided installation user experience that interacts via prompts for necessary inputs, informs of erroneous/invalid inputs, and provides status and feedback throughout the installation workflow with very few steps, that works for disconnected, on-premises environments.

Installation is performed from a bootable image that doesn't contain cluster details or user details, since these details will be collected during the installation flow after booting the image in the target nodes.

This means that the image is generic and can be used to install an OpenShift cluster in any supported environment.

Why is this important?

Customers/partners desire a guided installation experience to deploy OpenShift with a UI that includes support for disconnected, on-premises environments, and which is as flexible in terms of configuration as UPI.

We have partners that need to provide an installation image that can be used to install new clusters on any location and for any users, since their business is to sell the hardware along with OpenShift, where OpenShift needs to be installable in the destination premises.

Acceptance Criteria

This experience should provide an experience closely matching the current hosted service (Assisted Installer), with the exception that it is limited to a single cluster because the host running the service will reboot and become a node in the cluster as part of the deployment process.

User can successfully deploy OpenShift using the installer's guided experience.
User can specify a custom registry for disconnected scenario, which may include uploading a cert and validation.
User can specify node-network configurations, at a minimum: DHCP, Static IP, VLAN and Bonds.
User can use the same image to install clusters with different settings (collected during the installation).
Documentation is updated to guide user step-by-step to deploy OpenShift in disconnected settings with installer.

Dependencies

Guided installation onboarding design from UXD team.
UI development

Epic AGENT-408: GUI backend services

View the Description

Epic Goal

Have a friendly graphical user to perform interactive installation that runs on node0

Why is this important?

Allows the WebUI to run in Agent based installation where we can only count on node0 to run it
Provides a familiar (close to SaaS) interface to walk through the first cluster installation
Interactive installation takes us closer to having generated images that serve multiple first cluster installations

Scenarios

As an admin, I want to generate an ISO that I can send to the field to perform a friendly, interactive installation

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Assisted-Service WebUI needs an Agent based installation wizard

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Task AGENT-557: Allow Cluster/InfraEnv registration without configuration

View the Description

Modify the cluster registration code in the assisted-service client (used by create-cluster-and-infraenv.service) to allow creating the cluster given only the following config manifests:

ClusterImageSet
InfraEnv

If the following manifests are present, data from them should be used:

AgentPullSecret
NMStateConfig
extra manifests

Other manifests (ClusterDeployment, AgentClusterInstall) will not be present in an interactive install, and the information therein will be entered via the GUI instead.

A CLI flag or environment variable can be used to select the interactive mode.

Sub-task AGENT-615: Split create-cluster-and-infraenv.service and update dependencies

View the Description View the linked PRs

create-cluster-and-infraenv.service will be split into agent-register-cluster.service and agent-register-infraenv.service.

Any existing systemd service dependency on create-cluster-and-infraenv.service should be moved to agent-register-infraenv.service.

Acceptance Criteria:

Existing automated cluster installation flow should not be impacted.
Update agent-gather to include these service logs

https://github.com/openshift/installer/pull/7364

Feature OCPSTRAT-554: Improving error handling, propagation, collection, and disambiguation for users

View the Description

To be broken into one feature epic and a spike:

feature: error type disambiguation and error propagation into operator status
*spike: general improvement on making errors more actionable for the end user*

The MCO today has multiple layers of errors. There are generally speaking 4 locations where an error message can appear, from highest to lowest:

The MCO operator status
The MCPool status
The MCController/Daemon pod logs
The journal logs on the node

The error propagation is generally speaking not 1-to-1. The operator status will generally capture the pool status, but the full error from Controller/Daemon does not fully bubble up to pool/operator, and the journal logs with error generally don’t get bubbled up at all. This is very confusing for customers/admins working with the MCO without full understanding of the MCO’s internal mechanics:

The real error is hard to find
The error message is often generic and ambiguous
The solution/workaround is not clear at all

Using “unexpected on-disk state” as an example, this can be caused by any amount of the following:

An incomplete update happened, and something rebooted the node
The node upgrade was successful until rpm-ostree, which failed and atomically rolled back
The user modified something manually
Another operator modified something manually
Some other service/network manager overwrote something MCO writes

Etc. etc.

Since error use cases are wide and varied, there are many improvements we can perform for each individual error state. This epic aims to propose targeted improvements to error messaging and propagation specifically. The goals being:

De-ambigufying different error cases with the same message
Adding more error catching, including journal logs and rpm-ostree errors
Propagating full error messages further up the stack, up to the operator status in a clear manner
Adding actionable fix/information messages alongside the error message

With a side objective of observability, including reporting all the way to the operator status items such as:

Reporting the status of all pools
Pointing out current status of update/upgrade per pool
What the update/upgrade is blocking on
How to unblock the upgrade

Approaches can include:

Better error messaging starting with common error cases
De-ambigufying config mismatch
Capturing rpm-ostree logs from previous boot, in case of osimageurl mismatch errors
Capturing full daemon error message back to pool/operator status
Adding a new field to the MCO operator spec, that attempts to suggest fixes or where to look next, when an error occurs
Adding better alerting messages for MCO errors

Epic MCO-1: Observability Infrastructure and Enhanced metrics in MCO

View the Description

It became clear overtime that we need to enhance most of the MCO metrics that we have as well as adding more related to the MCC. The MCC is tasked with watching what's going on with pools and it makes sense to add more metrics and alerting especially there. There are various hiccups with metrics that we've been and are going through. This epic aims at addressing those and start working on adding more useful metrics/alerting to the MCO. Another aim for this epic would be (but we can split it out) to provide more data to help us proactively debug clusters when things go wrong.

After spiking, the work for metric enhancement is split into the following way:

Expose more pool health metrics, which includes (1) Expose metrics in MCD to enable node watcher (2) Expose metrics in MCO to enable MCP watcher (3) Expose metrics in MCC, especially for MCC sub-controllers, to enable a comprehensive watcher on both node, pool and configs
oauth-proxy to kube-rbac-proxy migration for metric backend
Metric infrastructure re-org to be ready for customization and CRD consumption
- This part of the work is originally prioritized and under construction with a design focusing on metric centralization: with the introduce of the state controller in ~~MCO-452~~, the MCO will use the state controller as a centralized metric registering, listening and reporting center. All the other sub-components of the MCO will report to the state controller when there is an update. By bringing in this unified infrastructure, the MCO provides the user with an entry point to touch metric configuration all at once. [USER CASE: the user can pass in a CRD with all the metrics they want to turn on, the state controller will then interpreting and syncing the customer-defined requirements passed in, and enable corresponding metrics accordingly]
- However, the implementation of the work is severely delayed due to (1) the re-design of the state controller (See updates for ~~MCO-690~~) (2) the redesign for the message bus between the MCO sub-components and the state controller (See updates for ~~MCO-751~~)
- It is no longer within the scope for 4.15 and will be tracked in MCO-846 for 4.16

Bug OCPBUGS-20427: No datapoint found when querying up certain registered metrics

View the Description View the linked PRs

Description of problem:

When querying up for registered metrics in Console/Observe/Metrics, certain metrics are not showing up (return "No datapoints found"). These metrics include mcc_drain_err, mcc_state, etc

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1. start up a cluster 
2. run query e.g. mcc_drain_err in Console/Observe/Metrics

Actual results:

No datapoints found

Expected results:

Numerical result

Additional info:

This only happens to metrics that are defined / registed with the type gaugeVec. It is discovered that for any vec (gaugevec, countervec), an initialization is needed, otherwise, it will not show up until updates: https://github.com/thought-machine/please-servers/pull/258

Story MCO-681: Add Key State Metrics

View the Description View the linked PRs

Add the following types of metrics in the proper places in the MCO:

State for each controller in the MCC
MCC update time for the previous update
Timestamps for MCD config drifts
Timestamps for pivot error metrics

This will involve registering a new metrics and making sure that it is updated when key events occur

Feature OCPSTRAT-663: Enable BYOK for IBM Cloud VPC

View the Description

Feature Overview (aka. Goal Summary)

Enable support to bring your own encryption key (BYOK) for OpenShift on IBM Cloud VPC.

Goals (aka. expected user outcomes)

As a user I want to be able to provide my own encryption key when deploying OpenShift on IBM Cloud VPC so the cluster infrastructure objects, VM instances and storage objects, can use that user-managed key to encrypt the information.

Requirements (aka. Acceptance Criteria):

The Installer will provide a mechanism to specify a user-managed key that will be used to encrypt the data on the virtual machines that are part of the OpenShift cluster as well as any other persistent storage managed by the platform via Storage Classes.

Background

This feature is a required component for IBM's OpenShift replatforming effort.

Documentation Considerations

The feature will be documented as usual to guide the user while using their own key to encrypt the data on the OpenShift cluster running on IBM Cloud VPC

Epic STOR-1357: Review and support the work from IBM to enable BYOK in OpenShift on IBM Cloud VPC

View the Description

Epic Goal*

Review and support the IBM engineering team while enabling BYOK support for OpenShift on IBM Cloud VPC

Why is this important? (mandatory)

As part of the replatform work IBM is doing for their OpenShift managed service this feature is Key for that work

Scenarios (mandatory)

All the cluster storage objects, VMs storage and storage managed by StorageClass objects defined in the platform., will be encrypted using the user-managed key provided in the installation manifest

Dependencies (internal and external) (mandatory)

https://issues.redhat.com/browse/CORS-2694

Contributing Teams(and contacts) (mandatory)

The IBM development team will be responsible of developing this feature and RH engineering team will review and support their work

Done - Checklist (mandatory)

The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

CI Testing - Basic e2e automationTests are merged and completing successfully
Documentation - Content development is complete.
QE - Test scenarios are written and executed successfully.
Technical Enablement - Slides are complete (if requested by PLM)
Engineering Stories Merged
All associated work items with the Epic are closed
Epic status should be “Release Pending”

Story OCPCLOUD-2264: Provide BYOK encryption support for OpenShift on IBM Cloud VPC (MAPI)

View the Description View the linked PRs

User Story

As a user, I am able to provide my own key for boot volume encryption when deploying OpenShift on IBM Cloud VPC.

Background

Provide encryption key to infrastructure provisioning for worker nodes.

https://github.com/openshift/machine-api-provider-ibmcloud/pull/27

Story STOR-1487: Provide BYOK encryption support for OpenShift on IBM Cloud VPC (API)

View the Description View the linked PRs

User Story

As a user, I am able to provide my own key for disk encryption when deploying OpenShift on IBM Cloud VPC.

Background

Add IBM Cloud VPC specific config spec with fields for encryption keys.

Epic CORS-2694: Review and support the work from IBM to enable BYOK in OpenShift on IBM Cloud VPC

View the Description

Epic Goal

Review and support the IBM engineering team while enabling BYOK support for OpenShift on IBM Cloud VPC

Why is this important?

As part of the replatform work IBM is doing for their OpenShift managed service this feature is Key for that work

Scenarios

The installer will allow the user to provide their own key information to be used to encrypt the VMs storage and any storage object managed by OpenShift StorageClass objects

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story CORS-2934: Provide BYOK encryption support for OpenShift on IBM Cloud VPC (Installer)

View the Description View the linked PRs

User Story:

As a user, I am able to provide my own key for boot volume encryption when deploying OpenShift on IBM Cloud VPC.

Acceptance Criteria:

User provides key in install config and proceeds with installation
Boot volumes of control plane nodes are encrypted accordingly

Engineering Details:

Provide encryption key to infrastructure for provisioning of control plane nodes.

Story CORS-3003: CI implementation: Review and support the work from IBM to enable BYOK in OpenShift on IBM Cloud VPC

View the Description View the linked PRs

User Story:

As a (user persona), I want to be able to:

Capability 1
Capability 2
Capability 3

so that I can achieve

Outcome 1
Outcome 2
Outcome 3

Acceptance Criteria:

Description of criteria:

Upstream documentation
Point 1
Point 2
Point 3

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

(optional) https://github/com/link.to.enhancement/
(optional) https://issues.redhat.com/link.to.spike
Engineering detail 1
Engineering detail 2

This requires/does not require a design proposal.
This requires/does not require a feature gate.

https://github.com/openshift/installer/pull/7811

Feature OCPSTRAT-680: Integrate Cluster API (CAPI) in standalone OCP-Phase 2

View the Description

Feature Overview (aka. Goal Summary)

Overarching Goal
Move to using the upstream Cluster API (CAPI) in place of the current implementation of the Machine API for standalone Openshift.
Phase 1 & 2 covers implementing base functionality for CAPI.
Phase 2 also covers migrating MAPI resources to CAPI.

Phase 2 Goal:

Complete the design of the Cluster API (CAPI) architecture and build the core operator logic
attach and detach of load balancers for internal and external load balancers for control plane machines on AWS, Azure, GCP and other relevant platforms
manage the lifecycle of Cluster API components within OpenShift standalone clusters
E2E tests

for Phase-1, incorporating the assets from different repositories to simplify asset management.

Background, and strategic fit

Initially CAPI did not meet the requirements for cluster/machine management that OCP had the project has moved on, and CAPI is a better fit now and also has better community involvement.
CAPI has much better community interaction than MAPI.
Other projects are considering using CAPI and it would be cleaner to have one solution
Long term it will allow us to add new features more easily in one place vs. doing this in multiple places.

Acceptance Criteria

Epic OCPCLOUD-1618: Install Cluster API into OpenShift Standalone Clusters (Technology Preview)

View the Description

Epic Goal

To create an operator to manage the lifecycle of Cluster API components within OpenShift standalone clusters

Why is this important?

We need to be able to install and lifecycle the Cluster API ecosystem within standalone OpenShift
We need to make sure that we can update the components via an operator
We need to make sure that we can lifecycle the APIs via an operator

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.

Bug OCPBUGS-19841: User data secrets should include format ignition

View the Description View the linked PRs

Description of problem:


Cluster API expects the user data secret created by the installer to contain a `format: ignition` value. This is used by the various providers to identify that they should treat the user data as ignition.

We do not currently set this value, but should be.

This may cause issues with certain providers that expect ignition to be uploaded to blob storage, we should identify the behaviour of the providers we care about (AWS, Azure, GCP, vSphere) and ensure that they behave well and continue to work with this change. (This may mean upstream changes to only upload large ignitions to the storage/make the storage optional)

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-capi-operator/pull/140

Bug OCPBUGS-19842: Cannot delete CAPI Cluster in non-CAPI namespace

View the Description View the linked PRs

Description of problem:

I would like to use OpenShift clusters as management clusters to create other guest clusters.
This is not currently possible because of restrictions imposed by the validating webhook for the Cluster object which prevents Cluster objects being deleted.
The webhook _should_ only apply to the `openshift-cluster-api` namespace, but presently will apply to all namespaces.

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1. Create a Cluster object in a namespace that isn't openshift-cluster-api
2. Attempt to delete the Cluster object you just created
3.

Actual results:

Error, cannot delete Cluster

Expected results:

Cluster object deleted successfully

Additional info:

https://github.com/openshift/cluster-capi-operator/pull/135

Feature OCPSTRAT-698: [GA] Support static IP assignments with vSphere IPI

View the Description

Goal

As an OpenShift on vSphere administrator, I want to specify static IP assignments to my VMs.

As an OpenShift on vSphere administrator, I want to completely avoid using a DHCP server for the VMs of my OpenShift cluster.

Why is this important?

Customers want the convenience of IPI deployments for vSphere without having to use DHCP. As in bare metal, where ~~METAL-1~~ added this capability, some of the reasons are the security implications of DHCP (customers report that for example depending on configuration they allow any device to get in the network). At the same time IPI deployments only require to our OpenShift installation software, while with UPI they would need automation software that in secure environments they would have to certify along with OpenShift.

Acceptance Criteria

I can specify static IPs for node VMs at install time with IPI

Previous Work

Bare metal related work:

CoreOS Afterburn:

https://github.com/coreos/afterburn/blob/main/src/providers/vmware/amd64.rs#L28

https://github.com/openshift/installer/blob/master/upi/vsphere/vm/main.tf#L34

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Epic SPLAT-1143: [GA] Support static IP assignments with vSphere IPI

View the Description

Goal

As an OpenShift on vSphere administrator, I want to specify static IP assignments to my VMs.

As an OpenShift on vSphere administrator, I want to completely avoid using a DHCP server for the VMs of my OpenShift cluster.

Why is this important?

Acceptance Criteria

I can specify static IPs for node VMs at install time with IPI

Previous Work

Bare metal related work:

CoreOS Afterburn:

https://github.com/coreos/afterburn/blob/main/src/providers/vmware/amd64.rs#L28

https://github.com/openshift/installer/blob/master/upi/vsphere/vm/main.tf#L34

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story SPLAT-1172: Enhance machine status relating to IPAddressClaimed

View the Description View the linked PRs

<!--

Please make sure to fill all story details here with enough information so
that it can be properly sized and is immediately actionable. Our Definition
of Ready for user stories is detailed in the link below:

https://docs.google.com/document/d/1Ps9hWl6ymuLOAhX_-usLmZIP4pQ8PWO15tMksh0Lb_A/

As much as possible, make sure this story represents a small chunk of work
that could be delivered within a sprint. If not, consider the possibility
of splitting it or turning it into an epic with smaller related stories.

Before submitting it, please make sure to remove all comments like this one.

-->

{}USER STORY:{}

As a cluster admin, the status field relating to the IPAddressClaimed is a bit confusing and should be improved to make it better to understand.

{}DESCRIPTION:{}

Currently the machine object has the following status:

  providerStatus:
    conditions:
    - lastTransitionTime: "2023-09-04T17:50:34Z"
      message: All IP address claims are bound
      reason: WaitingForIPAddress
      status: "False"
      type: IPAddressClaimed

The reason, status and message are a bit confusing when IP address claim is bound. The above is an example of what it says when it is finished.

{}ACCEPTANCE CRITERIA:{}

The status should look something like the following when IP is claimed:

    conditions:
    - lastTransitionTime: "2023-09-06T13:52:51Z"
      message: All IP address claims are bound
      reason: IPAddressesClaimed
      status: "True"
      type: IPAddressClaimed

reason text may change to match other condition fields formatting.

{}ENGINEERING DETAILS:{}

This most likely will involve updating a few different projects:

openshift/machine-api-operator
openshift/api

https://github.com/openshift/machine-api-operator/pull/1166

Feature OCPSTRAT-701: Use vSphere credentials from install-config in Agent

View the Description

Configure vSphere integration with credentials specified in the install-config.yaml file used by the agent install ISO image so that the platform integrations is configured on day-1 with the agent-based installer.

Epic AGENT-545: Use vSphere credentials from install-config

View the Description

Stop ignoring any (optional) vSphere credential values provided in the install-config, and pass them instead to the cluster. This allows users to configure the credentials at day 1 if they want to, though it should remain optional. It also means that an install-config usable for IPI (where the credentials are required) should result in an equivalent cluster when the agent installer is used.

Currently there are warning messages logged about these values being ignored when provided. These warnings should be removed when the values are no longer ignored.

In the absence of direct API support for this in assisted, we should be able to use install-config overrides (which the agent installer is already able to make use of internally).

Task AGENT-723: Use json.Marshal to correctly handle unmarshalling of vSphere Username

View the Description View the linked PRs

The json attribute name is named "user", but because the yaml.Marshal function is used, it fails to interpret that "user" defined in json is meant to represent the vSphere.vcenters.Username.

The fix is to change yaml.Marshal to json.Marshal in internal/installconfig/builder GetInstallConfig.

time="2023-09-25T21:29:07Z" level=error msg="error running openshift-install create manifests, stdout: level=warning msg=failed to parse first occurrence of unknown field: failed to unmarshal install-config.yaml: error unmarshaling JSON: while decoding JSON: json: unknown field \"username\"\nlevel=info msg=Attempting to unmarshal while ignoring unknown keys because strict unmarshaling failed\nlevel=error msg=failed to fetch Master Machines: failed to load asset \"Install Config\": failed to create install config: invalid \"install-config.yaml\" file: [platform.vsphere.vcenters.username: Required value: must specify the username, platform.vsphere.failureDomains.server: Invalid value: \"vcenterplaceholder\": server does not exist in vcenters]\n" func="github.com/openshift/assisted-service/internal/ignition.(*installerGenerator).runCreateCommand" file="/src/internal/ignition/ignition.go:1688" cluster_id=aad05e8e-a9fe-4f60-b580-0b2f6c4fdf10 error="exit status 3" go-id=1569 request_id=
time="2023-09-25T21:29:07Z" level=error msg="failed generating install config for cluster aad05e8e-a9fe-4f60-b580-0b2f6c4fdf10" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).generateClusterInstallConfig" file="/src/internal/bminventory/inventory.go:1755" cluster_id=aad05e8e-a9fe-4f60-b580-0b2f6c4fdf10 error="error running openshift-install manifests,  level=warning msg=failed to parse first occurrence of unknown field: failed to unmarshal install-config.yaml: error unmarshaling JSON: while decoding JSON: json: unknown field \"username\"\nlevel=info msg=Attempting to unmarshal while ignoring unknown keys because strict unmarshaling failed\nlevel=error msg=failed to fetch Master Machines: failed to load asset \"Install Config\": failed to create install config: invalid \"install-config.yaml\" file: [platform.vsphere.vcenters.username: Required value: m <TRUNCATED>: exit status 3" go-id=1569 pkg=Inventory request_id=

https://github.com/openshift/assisted-service/pull/5592

Task AGENT-718: Add vSphere credentials to install-config overrides

View the Description View the linked PRs

The vSphere credentials needs to be passed through to assisted-service. The AgentConfigInstall ZTP manifests has an annotation where the install-config override can be set.

Acceptance Criteria:

vSphere CSI network driver is installed when cluster is installed

https://github.com/openshift/installer/pull/7593

Feature OCPSTRAT-702: Agent baremetal platform - use per-host config values from install-config

View the Description

Use any values provided in the install-config (e.g. root device hints, network config, BMC details) as defaults for the agent-config for the baremetal platform when installing with the agent-based installer.

If the agent-config file specifies host-specific settings then these should override the install-config. This enables users to use the same config for both agent-based and IPI installation.

Epic AGENT-677: Pass BMC details to cluster if provided

View the Description

If the user has an IPI install-config complete with BMC credentials, pass them through to the cluster so that it will end up with BareMetalHosts that can be managed by MAPI just as they would after an IPI install, instead of then having to add the credentials again on day 2.

BMC credentials must remain optional in the install-config though.

Task AGENT-740: Add BMC fields to assisted-service baremetal host definition

View the linked PRs

https://github.com/openshift/assisted-service/pull/5675

Story AGENT-739: Include BMC details in install-config override if provided

View the Description View the linked PRs

User Story:

https://github.com/openshift/installer/pull/7645

Epic AGENT-552: Default per-host config to values in baremetal install-config

View the Description

We allow the user to specify per-host settings (e.g. root device hints, network config) in the agent-config file, on any platform.

However, if the platform is baremetal, there is also fields for this data in the platform section of the install-config. Currently any data specified here is ignored (with a warning about this logged).

We should use any values provided in the install-config as defaults for the agent-config. If the agent-config specifies host-specific settings then these should override the install-config. This enables users to use the same config for both agent-based and IPI installation.

As part of this work, the logs warning of unused values should be removed.

Story AGENT-713: Use install-config baremetal host config if not defined in agent-config

View the Description View the linked PRs

If the install-config.yaml contains baremetal host configuration fields, and no host fields are defined in agent-config.yaml, use the install-config settings to define the hosts. The following fields will be copied over from the bm host definition
https://github.com/openshift/installer/blob/master/pkg/types/baremetal/platform.go#L37

name -> hostname
role -> role
rootDeviceHints -> rootDeviceHints
networkConfig -> networkConfig

The install-config struct does not have an interfaces field that matches agent-config, instead the bootMacAddress in install-config host will be used to create the interfaces array.

The hardwareProfile and bootMode fields will not be copied.

https://github.com/openshift/installer/pull/7531

Feature OCPSTRAT-706: [Tech Prev] Add ControlPlaneMachineSet for vSphere

View the Description

Feature Overview (aka. Goal Summary)

Add ControlPlaneMachineSet for vSphere

Epic SPLAT-1110: [Tech Preview] Add ControlPlaneMachineSet support for vSphere

View the Description

Goal

Add ControlPlaneMachineSet for vSphere

Task SPLAT-1211: [vsphere] split infrastructure plumbing in to it's own PR

View the Description View the linked PRs

nutanix is performing work in parallel and we need to pull out the common bits in to a non-vSphere specific PR

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/250

Spike SPLAT-1129: [vsphere] implement control plane machineset reconciliation

View the Description View the linked PRs

investigate upstream impacts

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/228

Task SPLAT-1141: [vsphere] plumb infrastructure resource with template for failure domains

View the Description View the linked PRs

Add the name of the template to the infrastructure failure domain generated by the installer.

https://github.com/openshift/installer/pull/7418

Task SPLAT-1127: [vsphere] update OpenShift API

View the Description View the linked PRs

The OpenShift API needs to be updated to define VSphereFailureDomain. A draft PR is here: https://github.com/openshift/api/pull/1539

Also, ensure that the client-go and openshift-cluster-config-operator projects are bumped once the API changes merge.

https://github.com/openshift/cluster-config-operator/pull/342

Epic CORS-2726: Add ControlPlaneMachineSet support for vSphere

View the Description

Goal

Add ControlPlaneMachineSet for vSphere

Task SPLAT-1277: [vsphere] generate control plane machinset in the installer

View the linked PRs

https://github.com/openshift/installer/pull/7780

Feature OCPSTRAT-715: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Epic OCPCLOUD-2178: Rebase Cluster Infrastructure Components onto 1.28

View the Description

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

Cluster Infrastructure owned components should be running on Kubernetes 1.27
This includes
- The cluster autoscaler (+operator)
- Machine API operator
  - Machine API controllers for:
    - AWS
    - Azure
    - GCP
    - vSphere
    - OpenStack
    - IBM
    - Nutanix
- Cloud Controller Manager Operator
  - Cloud controller managers for:
    - AWS
    - Azure
    - GCP
    - vSphere
    - OpenStack
    - IBM
    - Nutanix
- Cluster Machine Approver
- Cluster API Actuator Package
- Control Plane Machine Set Operator

Why is this important?

Scenarios

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Task OCPCLOUD-2191: Rebase/update to K8s 1.28 for Machine API Provider GCP

View the Description View the linked PRs

To align with the 4.15 release, dependencies need to be updated to 1.28. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/machine-api-provider-gcp/pull/59

Task OCPCLOUD-2196: Rebase/update to K8s 1.28 for Cluster Autoscaler Operator

View the Description View the linked PRs

To align with the 4.15 release, dependencies need to be updated to 1.28. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/cluster-autoscaler-operator/pull/293

Task OCPCLOUD-2187: Rebase/update to K8s 1.28 for Cloud Provider AWS

View the Description View the linked PRs

To align with the 4.15 release, dependencies need to be updated to 1.28. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/cloud-provider-aws/pull/47

Task OCPCLOUD-2182: Rebase/update to K8s 1.28 for Cloud Provider Nutanix

View the Description View the linked PRs

To align with the 4.15 release, dependencies need to be updated to 1.28. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/cloud-provider-nutanix/pull/20

Task OCPCLOUD-2192: Rebase/update to K8s 1.28 for Machine API Provider Azure

View the Description View the linked PRs

To align with the 4.15 release, dependencies need to be updated to 1.28. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/machine-api-provider-azure/pull/76

Task OCPCLOUD-2184: Rebase/update to K8s 1.28 for Cloud Provider vSphere

View the Description View the linked PRs

To align with the 4.15 release, dependencies need to be updated to 1.28. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/cloud-provider-vsphere/pull/42

Task OCPCLOUD-2181: Rebase/update to K8s 1.28 for Cluster Machine Approver

View the Description View the linked PRs

To align with the 4.15 release, dependencies need to be updated to 1.28. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/cluster-machine-approver/pull/203

Task OCPCLOUD-2190: Rebase/update to K8s 1.28 for Machine API Provider IBM

View the Description View the linked PRs

To align with the 4.15 release, dependencies need to be updated to 1.28. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/machine-api-provider-ibmcloud/pull/25

Task OCPCLOUD-2193: Rebase/update to K8s 1.28 for Machine API Provider AWS

View the Description View the linked PRs

To align with the 4.15 release, dependencies need to be updated to 1.28. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/machine-api-provider-aws/pull/87

Task OCPCLOUD-2183: Rebase/update to K8s 1.28 for Cloud Provider IBM

View the Description View the linked PRs

To align with the 4.15 release, dependencies need to be updated to 1.28. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/cloud-provider-ibm/pull/47

Task OCPCLOUD-2189: Rebase/update to K8s 1.28 for Machine API Provider Nutanix

View the Description View the linked PRs

To align with the 4.15 release, dependencies need to be updated to 1.28. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/machine-api-provider-nutanix/pull/54

Task OCPCLOUD-2188: Rebase/update to K8s 1.28 for Cluster Cloud Controller Manager Operator

View the Description View the linked PRs

To align with the 4.15 release, dependencies need to be updated to 1.28. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/285

Task OCPCLOUD-2194: Rebase/update to K8s 1.28 for Machine API Operator

View the Description View the linked PRs

To align with the 4.15 release, dependencies need to be updated to 1.28. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/machine-api-operator/pull/1168

Task OCPCLOUD-2195: Rebase/update to K8s 1.28 for Cluster Autoscaler

View the Description View the linked PRs

To align with the 4.15 release, dependencies need to be updated to 1.28. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/kubernetes-autoscaler/pull/265

Task OCPCLOUD-2185: Rebase/update to K8s 1.28 for Cloud Provider GCP

View the Description View the linked PRs

To align with the 4.15 release, dependencies need to be updated to 1.28. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/cloud-provider-gcp/pull/36

Task OCPCLOUD-2179: Rebase/update to K8s 1.28 for Control Plane Machine Set Operator

View the Description View the linked PRs

To align with the 4.15 release, dependencies need to be updated to 1.28. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/248

Epic CCO-440: Upgrade to Kubernetes 1.28

View the Description View the linked PRs

Epic Goal

The goal of this epic is to upgrade all OpenShift and Kubernetes components that cloud-credential-operator uses to v1.28 which keeps it on par with rest of the OpenShift components and the underlying cluster version.

https://github.com/openshift/cloud-credential-operator/pull/614

Epic OCPCLOUD-2213: Rebase Cluster API Components onto 1.27

View the Description

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

Cluster Infrastructure owned CAPI components should be running on Kubernetes 1.27
target is 4.15 since CAPI is always a release behind upstream

Why is this important?

Scenarios

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Task OCPCLOUD-2215: Rebase/update to K8s 1.27 for Cluster API Operator

View the Description View the linked PRs

To align with the 4.15 release, dependencies need to be updated to 1.27. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/cluster-api-operator/pull/25

Task OCPCLOUD-2220: Rebase/update to K8s 1.27 for Cluster API Provider IBM

View the Description View the linked PRs

To align with the 4.15 release, dependencies need to be updated to 1.27. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/cluster-api-provider-ibmcloud/pull/59

Task OCPCLOUD-2214: Rebase/update to K8s 1.27 for Cluster CAPI Operator

View the Description View the linked PRs

To align with the 4.15 release, dependencies need to be updated to 1.27. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/cluster-capi-operator/pull/132

Task OCPCLOUD-2216: Rebase/update to K8s 1.27 for Cluster API Provider AWS

View the Description View the linked PRs

To align with the 4.15 release, dependencies need to be updated to 1.27. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/cluster-api-provider-aws/pull/478

Task OCPCLOUD-2219: Rebase/update to K8s 1.27 for Cluster API Provider vSphere

View the Description View the linked PRs

To align with the 4.15 release, dependencies need to be updated to 1.27. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/cluster-api-provider-vsphere/pull/18

Task OCPCLOUD-2222: Rebase/update to K8s 1.27 for Core Cluster API

View the Description View the linked PRs

To align with the 4.15 release, dependencies need to be updated to 1.27. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/cluster-api/pull/181

Task OCPCLOUD-2217: Rebase/update to K8s 1.27 for Cluster API Provider Azure

View the Description View the linked PRs

To align with the 4.15 release, dependencies need to be updated to 1.27. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/cluster-api-provider-azure/pull/284

Task OCPCLOUD-2218: Rebase/update to K8s 1.27 for Cluster API Provider GCP

View the Description View the linked PRs

To align with the 4.15 release, dependencies need to be updated to 1.27. This should be done by rebasing/updating as appropriate for the repository

Epic STOR-1383: OCP 4.15 release chores

View the Description

Epic Goal

Update all images that we ship with OpenShift to the latest upstream releases and libraries.
Exact content of what needs to be updated will be determined as new images are released upstream, which is not known at the beginning of OCP development work. We don't know what new features will be included and should be tested and documented. Especially new CSI drivers releases may bring new, currently unknown features. We expect that the amount of work will be roughly the same as in the previous releases. Of course, QE or docs can reject an update if it's too close to deadline and/or looks too big.

Traditionally we did these updates as bugfixes, because we did them after the feature freeze (FF).

Why is this important?

We want to ship the latest software that contains new features and bugfixes.

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.

Story STOR-1396: Chore: Update vmware-vsphere-csi-driver to the latest release

View the Description View the linked PRs

Update the driver to the latest upstream release. Notify QE and docs with any new features and important bugfixes that need testing or documentation.

(Using separate cards for each driver because these updates can be more complicated)

https://github.com/openshift/vmware-vsphere-csi-driver/pull/97

Story STOR-1408: Chore: Update ibm-vpc-block-csi-driver to the latest release

View the Description View the linked PRs

Update the driver to the latest upstream release. Notify QE and docs with any new features and important bugfixes that need testing or documentation.

This includes ibm-vpc-node-label-updater!

(Using separate cards for each driver because these updates can be more complicated)

Story STOR-1392: Chore: Update gcp-pd-csi-driver to the latest release

View the Description View the linked PRs

Update the driver to the latest upstream release. Notify QE and docs with any new features and important bugfixes that need testing or documentation.

(Using separate cards for each driver because these updates can be more complicated)

https://github.com/openshift/gcp-pd-csi-driver/pull/50

Story STOR-1394: Chore: Update azure-disk-csi-driver to the latest release

View the Description View the linked PRs

Update the driver to the latest upstream release. Notify QE and docs with any new features and important bugfixes that need testing or documentation.

(Using separate cards for each driver because these updates can be more complicated)

https://github.com/openshift/azure-disk-csi-driver/pull/51

Story STOR-1389: Chore: Update alibaba-cloud-csi-driver to the latest release

View the Description View the linked PRs

Update the driver to the latest upstream release. Notify QE and docs with any new features and important bugfixes that need testing or documentation.

(Using separate cards for each driver because these updates can be more complicated)

https://github.com/openshift/alibaba-cloud-csi-driver/pull/41

Story STOR-1402: Chore: update libraries in all operators

View the Description View the linked PRs

Update all OCP and kubernetes libraries in storage operators to the appropriate version for OCP release.

This includes (but is not limited to):

Kubernetes:
- client-go
- controller-runtime
OCP:
- library-go
- openshift/api
- openshift/client-go
- operator-sdk

Operators:

aws-ebs-csi-driver-operator (in csi-operator)
aws-efs-csi-driver-operator
azure-disk-csi-driver-operator
azure-file-csi-driver-operator
openstack-cinder-csi-driver-operator
gcp-pd-csi-driver-operator
gcp-filestore-csi-driver-operator
csi-driver-manila-operator
vmware-vsphere-csi-driver-operator
alibaba-disk-csi-driver-operator
ibm-vpc-block-csi-driver-operator
csi-driver-shared-resource-operator
ibm-powervs-block-csi-driver-operator
secrets-store-csi-driver-operator

cluster-storage-operator
cluster-csi-snapshot-controller-operator
local-storage-operator
vsphere-problem-detector

EOL, do not upgrade:

github.com/oVirt/csi-driver-operator

Story STOR-1404: Chore: update CSI sidecars

View the Description View the linked PRs

Update all CSI sidecars to the latest upstream release from https://github.com/orgs/kubernetes-csi/repositories

external-attacher
external-provisioner
external-resizer
external-snapshotter
node-driver-registrar
livenessprobe

Corresponding downstream repos have `csi-` prefix, e.g. github.com/openshift/csi-external-attacher.

This includes update of VolumeSnapshot CRDs in cluster-csi-snapshot-controller- operator assets and client API in go.mod. I.e. copy all snapshot CRDs from upstream to the operator assets + go get -u github.com/kubernetes-csi/external-snapshotter/client/v6 in the operator repo.

Story STOR-1400: Chore: Update aws-ebs-csi-driver to the latest release

View the Description View the linked PRs

Update the driver to the latest upstream release. Notify QE and docs with any new features and important bugfixes that need testing or documentation.

(Using separate cards for each driver because these updates can be more complicated)

Story STOR-1390: Chore: Update azure-file-csi-driver to the latest release

View the Description View the linked PRs

Update the driver to the latest upstream release. Notify QE and docs with any new features and important bugfixes that need testing or documentation.

(Using separate cards for each driver because these updates can be more complicated)

https://github.com/openshift/azure-file-csi-driver/pull/43

Epic STOR-1425: Update Control Plane Kubernetes Version to 1.28

View the Description View the linked PRs

Epic Goal*

What is our purpose in implementing this? What new capability will be available to customers?

Why is this important? (mandatory)

What are the benefits to the customer or Red Hat? Does it improve security, performance, supportability, etc? Why is work a priority?

Scenarios (mandatory)

Provide details for user scenarios including actions to be performed, platform specifications, and user personas.

Dependencies (internal and external) (mandatory)

What items must be delivered by other teams/groups to enable delivery of this epic.

Contributing Teams(and contacts) (mandatory)

Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.

Development -
Documentation -
QE -
PX -
Others -

Acceptance Criteria (optional)

Provide some (testable) examples of how we will know if we have achieved the epic goal.

Drawbacks or Risk (optional)

Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.

Done - Checklist (mandatory)

CI Testing - Basic e2e automationTests are merged and completing successfully
Documentation - Content development is complete.
QE - Test scenarios are written and executed successfully.
Technical Enablement - Slides are complete (if requested by PLM)
Engineering Stories Merged
All associated work items with the Epic are closed
Epic status should be “Release Pending”

Epic WRKLDS-806: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Epic SDN-4119: Rebase Kube version to 1.28 in repos that the SDN team maintains

View the Description

https://docs.google.com/document/d/1h1XsEt1Iug-W9JRheQas7YRsUJ_NQ8ghEMVmOZ4X-0s/edit --> this is the link for rebase help

Story SDN-4123: Rebase SDN to 1.28

View the linked PRs

https://github.com/openshift/sdn/pull/580

Story SDN-4124: Downstream ovn-k to 1.28

View the linked PRs

https://github.com/openshift/ovn-kubernetes/pull/1965

Feature OCPSTRAT-716: Mixed public/private exposure for OpenShift API and OpenShift Ingress on Azure

View the Description

Feature Overview (aka. Goal Summary)

Support OpenShift deployments on Azure to configure public and private exposure for OpenShift API and OpenShift Ingress separately at installation time

Goals (aka. expected user outcomes)

To reconcile the difference in publishing strategy the Installer provides on Azure vs what ARO offers. Upstream the capability to set split public and private API server and Ingress component.

Requirements (aka. Acceptance Criteria):

The user should be able to provide public or private publish configuration at installation time for the API and Ingress components.

Use Cases (Optional):

The initial use case will be for ARO customers as explained in ARO-2803

Background

ARO currently offers the ability to specify APIserver and ingress visibility at install time. You can set either to public or private, and they can differ (i.e. public ingress, private apiserver).

OpenShift currently does not have this feature natively, it must be done either day2 or via some other mechanism. Based on what you set the value of "publish" to in your install config (Internal | External) the components will or will not be internet accessible.

Documentation Considerations

This will require user facing documentation as any other option we currently document for OpenShift Installer

Epic CORS-2743: Hybrid SRE: Mix of public and private API server and ingress configurability

View the Description

Feature Overview:

Enablement of creation of a mix of public and private API servers and ingress configuration on Azure for improving access to the kubernetes cluster in certain circumstances

Goals:

Allow for the users to create the new mix of public and private API servers and successfully resolve the URLs/IPs to access the cluster

Requirements:

Users providing the setup they need in the install config
URL resolution is still possible after the mix is installed

Use Cases:

ARO needs this for a setup they have in mind.
https://issues.redhat.com/browse/ARO-2803

Questions to Answer:

A spike is needed to figure out how this needs to be done

Out of Scope:

Background:

ARO needs this for a setup they have in mind

Customer Considerations:

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status

Documentation Considerations:

Docs for the install config and the architecture in general need to be done as the change is significant

Interoperability Considerations:

Possibly the cluster ingress operator and the CAPI need to be checked if they need to be changed to accommodate this

Story CORS-2854: Mix of public and private ingress and api-servers

View the Description View the linked PRs

User Story:

As a developer, I want to be able to:

Allow for set the visibility of Ingress and API server.

so that I can

Support ARO functionality of mix public private components in day 1.

Acceptance Criteria:

Description of criteria:

ARO/users can set the visibility of ingress and API servers.

(optional) Out of Scope:

Engineering Details:

Engineering detail 1
Engineering detail 2

This requires/does not require a design proposal.
This requires/does not require a feature gate.

https://github.com/openshift/installer/pull/7547

Feature OCPSTRAT-717: Support Azure Storage Account encryption

View the Description

Feature Overview (aka. Goal Summary)

Add support for the Installer to encrypt the Storage Account used for OpenShift on Azure at installation time.

Goals (aka. expected user outcomes)

As a user I can instruct the Installer to encrypt the Storage Account created while deploying OpenShift on Azure for increased security.

Requirements (aka. Acceptance Criteria):

The user is able to provide a Storage Account encryption ID to be used when the Installer creates the Storage Account for OpenShift on Azure.

Background

Related work on disks encryption for Azure was delivered as part of ~~OCPSTRAT-308~~ feature. Now we are extending this to the Storage Account.

Documentation Considerations

Usual documentation will be required to explain how to use this option in the Installer

Implementation Considerations

Terraform is used for storage account creation

Epic CORS-2744: Hybrid SRE: Storage Account Encryption

View the Description

Feature Overview:

Storage account on Azure needs to have the option to be encrypted for better security

Goals:

Users should be able to encrypt the storage account

Requirements:

SA encryption ID is accepted and used during terraform to encrypt the storage account (SA)

Use Cases:

Questions to Answer:

Out of Scope:

Nothing

Background:

A similar work for the disk encryption was done and needs to be extended for the storage accounts
https://github.com/openshift/installer/pull/5641

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status

Documentation Considerations:

Docs to accommodate the encryption needs to be done

Interoperability Considerations:

Which other projects and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status

Story CORS-2845: Hybrid SRE: Allow for storage account encryption

View the Description View the linked PRs

User Story:

As a user, I want to be able to:

Pass customer managed keys to the installer

so that

The installer encrypts the storage accounts created

Acceptance Criteria:

Description of criteria:

User is able to pass the neccesary information for encryption through the install config
Storage account created for the control plane use is encrypted

(optional) Out of Scope:

Not encrypting the storage account used for bootstrap

Engineering Details:

Encryption requires key vault id and user assigned identity keys mentioned here
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/storage_account#customer_managed_key
Storage account must be set to `Premium` and kind to `StorageV2` which is default.

This requires/does not require a design proposal.
This requires/does not require a feature gate.

https://github.com/openshift/installer/pull/7520

Feature OCPSTRAT-722: [Phase 1] Hardware RAID support with Metal3 via Redfish

View the Description

Goal

Hardware RAID support on Dell, Supermicro and HPE with Metal3.

Why is this important

Setting up RAID devices is a common operation in the hardware for OpenShift nodes. While there's been work at Fujitsu for configuring RAID in Fujitsu servers with Metal3, we don't support any generic interface with Redfish to extend this support and set it up it in Metal3.

Dell, Supermicro and HPE, which are the most common hardware platforms we find in our customers environments are the main target.

Feature OCPSTRAT-73: Preview: Custom RHCOS boot images

View the Description

Feature Overview

CoreOS Layering allows users to derive customized disk images for cluster nodes and update their nodes to these custom images. However, OpenShift still uses pristine images for the first boot of any machine. Customers should be allowed to use customized firstboot RHCOS images for machine installation.

This is critical to the needs of many OEMs and those working with cutting edge hardware with rapidly developing drivers.

Goals

Allow customers to load out-of-tree device drivers required for RHCOS install. Usually NIC or storage drivers.
It is OK for the initial phase to only apply to bare metal.

Epic COS-1935: Custom RHCOS Boot Images - Phase 1

View the Description

Investigate how we can enable customer to create their own boot images from a derived container.

One potential path is to work together with Image Builder team on what technology can be shared between our pipeline(s) and their building service.

Phase 1 focuses on building a RAW/QCOW2 FCOS/RHCOS disk image from a container image using osbuild and integrating with COSA / our pipeline.

Story COS-2526: create coreos.aleph-version OSBuild stage

View the Description View the linked PRs

In COSA when we create a disk image we stamp in a .coreos-aleph-version.json file that we then use for various other things later in time. We need an equivalent component when building disk images using OSBuild. Today the file is creating using this code:

cat > $rootfs/.coreos-aleph-version.json << EOF
{
    "build": "${buildid}",
    "ref": "${ref}",
    "ostree-commit": "${commit}",
    "imgid": "${imgid}"
}
EOF

We can feed some of this information into OSBuild via the manifest file but it might be really nice to be able to detect it dynamically from the installed tree so that we can reduce the requirements on future users of this having to know the values.

Another thing, I think since the `imgid` part here depends on what kind of image we are creating I think we might need to create a separate pipeline (i.e. `tree-qemu`) that adds in the correct imgid part, that is then fed into the qcow2 assembler. We probably also need to do something similar for `ignition.platform.id=qemu` in the future.

GitHub Internal PR: https://github.com/dustymabe/osbuild/pull/10
Upstream PR to OSBuild: https://github.com/osbuild/osbuild/pull/1475

https://github.com/openshift/machine-config-operator/pull/4034

Feature OCPSTRAT-731: OpenShift Optional Capabilities (Phase 6)

View the Description

Feature Overview

As a Cluster Administrator, I want to opt-out of certain operators at deployment time using any of the supported installation methods (UPI, IPI, Assisted Installer, Agent-based Installer) from UI (e.g. OCP Console, OCM, Assisted Installer), CLI (e.g. oc, rosa), and API.

As a Cluster Administrator, I want to opt-in to previously-disabled operators (at deployment time) from UI (e.g. OCP Console, OCM, Assisted Installer), CLI (e.g. oc, rosa), and API.
As a ROSA service administrator, I want to exclude/disable Cluster Monitoring when I deploy OpenShift with HyperShift — using any of the supported installation methods including the ROSA wizard in OCM and rosa cli — since I get cluster metrics from the control plane. This configuration should be persisted through not only through initial deployment but also through cluster lifecycle operations like upgrades.
As a ROSA service administrator, I want to exclude/disable Ingress Operator when I deploy OpenShift with HyperShift — using any of the supported installation methods including the ROSA wizard in OCM and rosa cli — as I want to use my preferred load balancer (i.e. AWS load balancer). This configuration should be persisted through not only through initial deployment but also through cluster lifecycle operations like upgrades.

Goals

Make it possible for customers and Red Hat teams producing OCP distributions/topologies/experiences to enable/disable some CVO components while still keeping their cluster supported.

Scenarios

This feature must consider the different deployment footprints including self-managed and managed OpenShift, connected vs. disconnected (restricted and air-gapped), supported topologies (standard HA, compact cluster, SNO), etc.
Enabled/disabled configuration must persist throughout cluster lifecycle including upgrades.
If there's any risk/impact of data loss or service unavailability (for Day 2 operations), the System must provide guidance on what the risks are and let user decide if risk worth undertaking.

Requirements

This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

(Optional) Use Cases

This Section:

Main success scenarios - high-level user stories
Alternate flow/scenarios - high-level user stories
...

Questions to answer…

Out of Scope

Background, and strategic fit

This part of the overall multiple release Composable OpenShift (OCPPLAN-9638 effort), which is being delivered in multiple phases:

Phase 1 (OpenShift 4.11): ~~OCPPLAN-7589~~ Provide a way with CVO to allow disabling and enabling of operators

~~CORS-1873~~ Installer to allow users to select OpenShift components to be included/excluded
~~OTA-555~~ Provide a way with CVO to allow disabling and enabling of operators
~~OLM-2415~~ Make the marketplace operator optional
~~SO-11~~ Make samples operator optional
~~METAL-162~~ Make cluster baremetal operator optional
~~OCPPLAN-8286~~ CI Job for disabled optional capabilities

Phase 2 (OpenShift 4.12): ~~OCPPLAN-7589~~ Provide a way with CVO to allow disabling and enabling of operators

~~CONSOLE-3160~~ Make console operator optional
~~CCXDEV-8079~~ Make Insights operator optional
~~CNF-5645~~ Make storage operator optional
~~CNF-5646~~ Make csi-snapshot-controller optional

Phase 3 (OpenShift 4.13): ~~OCPBU-117~~

~~OTA-554~~ Make oc aware of cluster capabilities
~~PSAP-741~~ Make Node Tuning Operator (including PAO controllers) optional

Phase 4 (OpenShift 4.14): ~~OCPSTRAT-36~~ (formerly ~~OCPBU-236~~)

OCPBU-352 Make Ingress Operator optional
~~CCO-186~~ ccoctl support for credentialing optional capabilities
~~MCO-499~~ MCD should manage certificates via a separate, non-MC path (formerly ~~IR-230~~ Make node-ca managed by CVO)
~~CNF-5642~~ Make cluster autoscaler optional
~~CNF-5643~~ - Make machine-api operator optional
~~WRKLDS-695~~ - Make DeploymentConfig API + controller optional
~~~~CNV-16274~~ OpenShift Virtualization on the Red Hat Application Cloud (not applicable)~~

Phase 4 (OpenShift 4.14): ~~OCPSTRAT-36~~ (formerly ~~OCPBU-236~~)

~~CCO-186~~ ccoctl support for credentialing optional capabilities
~~MCO-499~~ MCD should manage certificates via a separate, non-MC path (formerly ~~IR-230~~ Make node-ca managed by CVO)
~~CNF-5642~~ Make cluster autoscaler optional
~~CNF-5643~~ - Make machine-api operator optional
~~WRKLDS-695~~ - Make DeploymentConfig API + controller optional
~~~~CNV-16274~~ OpenShift Virtualization on the Red Hat Application Cloud (not applicable)~~
~~CNF-9115~~ - Leverage Composable OpenShift feature to make control-plane-machine-set optional
~~BUILD-565~~ - Make Build v1 API + controller optional
~~CNF-5647~~ Leverage Composable OpenShift feature to make image-registry optional (replaces ~~IR-351~~ - Make Image Registry Operator optional)

Phase 5 (OpenShift 4.15): ~~OCPSTRAT-421~~ (formerly ~~OCPBU-519~~)

~~OCPVE-634~~ - Leverage Composable OpenShift feature to make olm optional
~~CCO-419~~ (~~OCPVE-629~~) - Leverage Composable OpenShift feature to make cloud-credential optional

Phase 6 (OpenShift 4.16): OCPSTRAT-731

OCPSTRAT-346 (OCPBU-352) Make Ingress Operator optional
~~OCPCLOUD-2151~~ (~~OCPVE-628~~) - Leverage Composable OpenShift feature to make cloud-controller-manager optional
MON-3152 (OBSDA-242) Optional built-in monitoring

Phase 7 (OpenShift 4.17): OCPSTRAT-1308

IR-400 - Remove node-ca from CIRO*
CCO-493 Make Cloud Credential Operator optional for remaining providers and topologies (non-SNO topologies)
CNF-9116 Leverage Composable OpenShift feature to machine-auto-approver optional

References

Assumptions

Customer Considerations

Documentation Considerations

Questions to be addressed:

What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
Does this feature have doc impact?
New Content, Updates to existing content, Release Note, or No Doc Impact
If unsure and no Technical Writer is available, please contact Content Strategy.
What concepts do customers need to understand to be successful in [action]?
How do we expect customers will use the feature? For what purpose(s)?
What reference material might a customer want/need to complete [action]?
Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
What is the doc impact (New Content, Updates to existing content, or Release Note)?

Epic OCPEDGE-20: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story OCPEDGE-500: Bump api at CVO and fix tests

Feature OCPSTRAT-736: Add support to AWS Wavelength Zones

View the Description

Feature Overview (aka. Goal Summary)

Support AWS Wavelength Zones as a target infrastructure where to deploy OpenShift compute nodes.

Goals (aka. expected user outcomes)

As a user, I want to deploy OpenShift compute nodes on AWS Wavelength Zones at install time so I can leverage this infrastructure to deploy edge computing applications.

As a user, I want to extend an existing OpenShift cluster on AWS deploying compute nodes on AWS Wavelength Zones so I can leverage this infrastructure to deploy edge computing applications.

Requirements (aka. Acceptance Criteria):

The Installer will be able to deploy OpenShift on the public region into an existing VPC with compute nodes on AWS Wavelength Zones into an existing subnet.

The Installer will be able to deploy OpenShift on the public region with compute nodes on AWS Wavelength Zones automating the VPC creation in the public region and the subnet creation in the AWS Wavelength Zone

An existing OpenShift cluster on AWS public region can be extended by adding additional compute nodes (that can be automatically scaled) into AWS Wavelength Zones.

Use Cases (Optional):

Build media and entertainment applications.

Accelerate ML inference at the edge.

Develop connected vehicle applications.

Background

There is an extended demand for running specific workloads on edge locations on cloud providers. We have added support for AWS Outposts and AWS Local Zones. AWS Wavelength Zones is a demanded target infrastructure that customers are asking for including ROSA customers.

Documentation Considerations

Usual documentation will be required to instruct the user on how to use this feature

Epic SPLAT-1234: [aws] Add support to AWS Wavelength with nodes in public subnets - Day 0+2

View the Description

Epic Goal

Support AWS Wavelength Zones as a target infrastructure where to deploy OpenShift compute nodes.

Why is this important?

There is an extended demand for running specific workloads on edge locations on cloud providers. We have added support for AWS Outposts and AWS Local Zones. AWS Wavelength Zones is a demanded target infrastructure that customers are asking for including ROSA customers.

Scenarios

As a user, I want to deploy OpenShift compute nodes on AWS Wavelength Zones at install time in public subnets so I can leverage this infrastructure to deploy edge computing applications.

As a user, I want to deploy OpenShift compute nodes on AWS Wavelength Zones in public subnets in existing clusters installed with edge nodes so I can leverage this infrastructure to deploy edge computing applications.

Acceptance Criteria

MAPI Provider AWS supports Carrier IP Address assignment to EC2 instances deployed by MachineSet definition in public subnets (with PublicIP==yes)
Documentation with instructions provided on how to deploy machines in public subnets in AWS Wavelength Zones
CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story SPLAT-1220: [aws][wavelength] MAPI assign Carrier Public IP address when PublicIP is set

View the Description View the linked PRs

USER STORY:

As an developer, I would like to deploy ultra-low-latency sensitive applications in Wavelength Zones without needing an external load balancer to ingress traffic to edge nodes, so I can expand my application coverage using OCP
As an OCP administrator, I would like to create machine sets to deploy instances in Wavelength Zones in public subnets using, so I can quickly develier nodes to developers deploy applications to that zones
As an OCP developer, I would like to support MAPI to assign carrier public IP address when launching nodes in subnets in Wavelength Zones, so administrators can use the same interface as regular zone to extend worker nodes to edge zones.

Goal:

Wavelength Zones operates in Carrier Network, to ingress traffic to instances running into that zones, the Carrier IP Address must be assigned. The Carrier IP address is assigned to the instance when the network interface flag AssociateCarrierIpAddress must be set to when provisioning the instance.

The PublicIP is the existing flag available in the MachineSet to assign public IP address to node running in regular zone, the goal of this card is to teach MAPI AWS provider to look at the zone type for the subnet, and when the value is 'wavelength-zone' the flag AssociateCarrierIpAddress must be set to true, instead of the default AssociatePublicIpAddress, allowing EC2 service to assign public IP address in the carrier network.

Required:

MAPI AWS provider must support public IP assignment by discovering the zone type without changing the API

ACCEPTANCE CRITERIA:

MAPI AWS Provider PR must be created
PR must be passing unit tests
PR must be approved

ENGINEERING DETAILS:

https://github.com/openshift/machine-api-provider-aws/pull/78

Epic SPLAT-1218: [aws] Add support to AWS Wavelength - Day 0 BYO VPC

View the Description View the linked PRs

Epic Goal

Support AWS Wavelength Zones as a target infrastructure where to deploy OpenShift compute nodes in existing VPCs.

Why is this important?

There is an extended demand for running specific workloads on edge locations on cloud providers. We have added support for AWS Outposts and AWS Local Zones. AWS Wavelength Zones is a demanded target infrastructure that customers are asking for including ROSA customers.

Scenarios

The Installer will be able to deploy OpenShift on the public region into an existing VPC with compute nodes on AWS Wavelength Zones into an existing subnet.

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Epic SPLAT-1125: [aws] Add support to AWS Wavelength - Day 0 Fully automated

View the Description

Epic Goal

Support AWS Wavelength Zones as a target infrastructure where to deploy OpenShift compute nodes.

Why is this important?

There is an extended demand for running specific workloads on edge locations on cloud providers. We have added support for AWS Outposts and AWS Local Zones. AWS Wavelength Zones is a demanded target infrastructure that customers are asking for including ROSA customers.

Scenarios

As a user, I want to deploy OpenShift compute nodes on AWS Wavelength Zones at install time so I can leverage this infrastructure to deploy edge computing applications.

Acceptance Criteria

PR reviewed and merged
CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story SPLAT-1160: [aws][wavelength] Create support in installer to create infrastructure in WLZ (Day-0)

View the Description View the linked PRs

USER STORY:

As a cluster administrator, I want to deploy OpenShift compute nodes on AWS Wavelength Zones at install time, so I can leverage this infrastructure to deploy edge computing applications.

DESCRIPTION:

AWS Wavelength Zones are infrastructures running in RAN (Radio Access Network) owned by a Carrier, outside the region. AWS provides a few services, including computing, network ingress traffic in the carrier network, and private network connectivity with the VPC in the region.

The Installer must be able be able to deploy OpenShift worker nodes on the public region with compute nodes on AWS Wavelength Zones automating the VPC creation in the public region and the subnet creation in the AWS Wavelength Zone.

The installer must create the private and public subnet, and the worker node must use the private subnet.

In the traditional deployment for OpenShift on AWS, the private subnet egresss traffic to the internet using NAT Gateway. NAT Gateway is not currently supported in AWS Wavelength Zones. To remediate that, the deployment must follow the same strategy of associating the private subnets in Wavelength Zones to a route table in the region, preferably the route table for the WLZ's parent zone*.

*Every "edge zone" (Local and Wavelength Zone) is associated with one zone in the region, named the parent zone.

To ingress traffic to Wavelength Zone nodes, a public subnet must be created, and associated with a route table which have a carrier gateway as a default route. The carrier gateway is a similar internet gateway.

Required:

installer change supporting Wavelength Zones created as an "edge" compute node

Nice to have:

...

{}ACCEPTANCE CRITERIA:{}

<!--

Describe the goals that need to be achieved so that this story can be
considered complete. Note this will also help QE to write their acceptance
tests.

-->

{}ENGINEERING DETAILS:{}

<!--

Any additional information that might be useful for engineers: related
repositories or pull requests, related email threads, GitHub issues or
other online discussions, how to set up any required accounts and/or
environments if applicable, and so on.

-->

https://github.com/openshift/installer/pull/7369

Feature OCPSTRAT-74: Follow-up features for IBM Cloud

View the Description

Feature Overview

Extend OpenShift on IBM Cloud integration with additional features to pair the capabilities offered for this provider integration to the ones available in other cloud platforms.

Goals

Extend the existing features while deploying OpenShift on IBM Cloud.

Background, and strategic fit

This top level feature is going to be used as a placeholder for the IBM team who is working on new features for this integration in an effort to keep in sync their existing internal backlog with the corresponding Features/Epics in Red Hat's Jira.

Epic SPLAT-737: Enable installation of disconnected clusters on IBM Cloud

View the Description

Epic Goal

Enable installation of disconnected clusters on IBM Cloud. This epic will track associated work.

Why is this important?

Would like to support this at GA. It is an 'optional' feature that will be pursued after completing private cluster installation support ( https://issues.redhat.com/browse/SPLAT-731 ).

Scenarios

Install a disconnected cluster on IBM Cloud.

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.

Story OCPCLOUD-2266: Add IBM Cloud service endpoint override support (MAPI)

View the Description View the linked PRs

User Story:

A user currently is not able to create a Disconnected cluster, using IPI, on IBM Cloud.
Currently, support for BYON and Private clusters does exist on IBM Cloud, but support to override IBM Cloud Service endpoints does not exist, which is required to allow for Disconnected support to function (reach IBM Cloud private endpoints).

Description:

IBM dependent components of OCP will need to add support to use a set of endpoint override values in order to reach IBM Cloud Services in Disconnected environments.

The MAPI component will need to be able to allow all API calls to IBM Cloud Services, be directed to these endpoint values, in order to communicate in environments where the Public or default IBM Cloud Service endpoint is not available.

The endpoint overrides are available via the infrastructure/cluster (.status.platformStatus.ibmcloud.serviceEndpoints) resource, which is how a majority of components are consuming cluster specific configurations (Ingress, MAPI, etc.). It will be structured as such

apiVersion: config.openshift.io/v1
kind: Infrastructure
metadata:
  creationTimestamp: "2023-10-04T22:02:15Z"
  generation: 1
  name: cluster
  resourceVersion: "430"
  uid: b923c3de-81fc-4a0e-9fdb-8c4c337fba08
spec:
  cloudConfig:
    key: config
    name: cloud-provider-config
  platformSpec:
    type: IBMCloud
status:
  apiServerInternalURI: https://api-int.us-east-disconnect-21.ipi-cjschaef-dns.com:6443
  apiServerURL: https://api.us-east-disconnect-21.ipi-cjschaef-dns.com:6443
  controlPlaneTopology: HighlyAvailable
  cpuPartitioning: None
  etcdDiscoveryDomain: ""
  infrastructureName: us-east-disconnect-21-gtbwd
  infrastructureTopology: HighlyAvailable
  platform: IBMCloud
  platformStatus:
    ibmcloud:
      dnsInstanceCRN: 'crn:v1:bluemix:public:dns-svcs:global:a/fa4fd9fa0695c007d1fdcb69a982868c:f00ac00e-75c2-4774-a5da-44b2183e31f7::'
      location: us-east
      providerType: VPC
      resourceGroupName: us-east-disconnect-21-gtbwd
      serviceEndpoints:
      - name: iam
        url: https://private.us-east.iam.cloud.ibm.com
      - name: vpc
        url: https://us-east.private.iaas.cloud.ibm.com/v1
      - name: resourcecontroller
        url: https://private.us-east.resource-controller.cloud.ibm.com
      - name: resourcemanager
        url: https://private.us-east.resource-controller.cloud.ibm.com
      - name: cis
        url: https://api.private.cis.cloud.ibm.com
      - name: dnsservices
        url: https://api.private.dns-svcs.cloud.ibm.com/v1
      - name: cis
        url: https://s3.direct.us-east.cloud-object-storage.appdomain.cloud
    type: IBMCloud

The CCM is currently relying on updates to the openshift-cloud-controller-manager/cloud-conf configmap, in order to override its required IBM Cloud Service endpoints, such as:

data:
  config: |+
    [global]
    version = 1.1.0
    [kubernetes]
    config-file = ""
    [provider]
    accountID = ...
    clusterID = temp-disconnect-7m6rw
    cluster-default-provider = g2
    region = eu-de
    g2Credentials = /etc/vpc/ibmcloud_api_key
    g2ResourceGroupName = temp-disconnect-7m6rw
    g2VpcName = temp-disconnect-7m6rw-vpc
    g2workerServiceAccountID = ...
    g2VpcSubnetNames = temp-disconnect-7m6rw-subnet-compute-eu-de-1,temp-disconnect-7m6rw-subnet-compute-eu-de-2,temp-disconnect-7m6rw-subnet-compute-eu-de-3,temp-disconnect-7m6rw-subnet-control-plane-eu-de-1,temp-disconnect-7m6rw-subnet-control-plane-eu-de-2,temp-disconnect-7m6rw-subnet-control-plane-eu-de-3
    iamEndpointOverride = https://private.iam.cloud.ibm.com
    g2EndpointOverride = https://eu-de.private.iaas.cloud.ibm.com
    rmEndpointOverride = https://private.resource-controller.cloud.ibm.com

These changes have already landed in the release-1.28 branch (target OCP release-4.15 branch), but we need to make sure they get pulled into the github.com/openshift/cloud-provider-ibm branch and built into a 4.15 image.

Acceptance Criteria:

Installer validates and injects user provided endpoint overrides into cluster deployment process and the MAPI components use specified endpoints and start up properly.

https://github.com/openshift/machine-api-provider-ibmcloud/pull/28

Story STOR-1485: Add IBM Cloud service endpoint override support (Storage)

View the Description View the linked PRs

User Story:

Description:

IBM dependent components of OCP will need to add support to use a set of endpoint override values in order to reach IBM Cloud Services in Disconnected environments.

The Storage components will need to be able to allow all API calls to IBM Cloud Services, be directed to these endpoint values, in order to communicate in environments where the Public or default IBM Cloud Service endpoint is not available.

apiVersion: config.openshift.io/v1
kind: Infrastructure
metadata:
  creationTimestamp: "2023-10-04T22:02:15Z"
  generation: 1
  name: cluster
  resourceVersion: "430"
  uid: b923c3de-81fc-4a0e-9fdb-8c4c337fba08
spec:
  cloudConfig:
    key: config
    name: cloud-provider-config
  platformSpec:
    type: IBMCloud
status:
  apiServerInternalURI: https://api-int.us-east-disconnect-21.ipi-cjschaef-dns.com:6443
  apiServerURL: https://api.us-east-disconnect-21.ipi-cjschaef-dns.com:6443
  controlPlaneTopology: HighlyAvailable
  cpuPartitioning: None
  etcdDiscoveryDomain: ""
  infrastructureName: us-east-disconnect-21-gtbwd
  infrastructureTopology: HighlyAvailable
  platform: IBMCloud
  platformStatus:
    ibmcloud:
      dnsInstanceCRN: 'crn:v1:bluemix:public:dns-svcs:global:a/fa4fd9fa0695c007d1fdcb69a982868c:f00ac00e-75c2-4774-a5da-44b2183e31f7::'
      location: us-east
      providerType: VPC
      resourceGroupName: us-east-disconnect-21-gtbwd
      serviceEndpoints:
      - name: iam
        url: https://private.us-east.iam.cloud.ibm.com
      - name: vpc
        url: https://us-east.private.iaas.cloud.ibm.com/v1
      - name: resourcecontroller
        url: https://private.us-east.resource-controller.cloud.ibm.com
      - name: resourcemanager
        url: https://private.us-east.resource-controller.cloud.ibm.com
      - name: cis
        url: https://api.private.cis.cloud.ibm.com
      - name: dnsservices
        url: https://api.private.dns-svcs.cloud.ibm.com/v1
      - name: cis
        url: https://s3.direct.us-east.cloud-object-storage.appdomain.cloud
    type: IBMCloud

The CCM is currently relying on updates to the openshift-cloud-controller-manager/cloud-conf configmap, in order to override its required IBM Cloud Service endpoints, such as:

data:
  config: |+
    [global]
    version = 1.1.0
    [kubernetes]
    config-file = ""
    [provider]
    accountID = ...
    clusterID = temp-disconnect-7m6rw
    cluster-default-provider = g2
    region = eu-de
    g2Credentials = /etc/vpc/ibmcloud_api_key
    g2ResourceGroupName = temp-disconnect-7m6rw
    g2VpcName = temp-disconnect-7m6rw-vpc
    g2workerServiceAccountID = ...
    g2VpcSubnetNames = temp-disconnect-7m6rw-subnet-compute-eu-de-1,temp-disconnect-7m6rw-subnet-compute-eu-de-2,temp-disconnect-7m6rw-subnet-compute-eu-de-3,temp-disconnect-7m6rw-subnet-control-plane-eu-de-1,temp-disconnect-7m6rw-subnet-control-plane-eu-de-2,temp-disconnect-7m6rw-subnet-control-plane-eu-de-3
    iamEndpointOverride = https://private.iam.cloud.ibm.com
    g2EndpointOverride = https://eu-de.private.iaas.cloud.ibm.com
    rmEndpointOverride = https://private.resource-controller.cloud.ibm.com

The Storage component is reliant on the CCM cloud-conf configmap, but only the IAM, ResourceManager, and VPC endpoints are supplied, since that is all CCM uses. If additional IBM Cloud Services are used (e.g., COS, etc.), they will not be available in the CCM cloud-conf, but will always be in the infrastructure/cluster resource.

Acceptance Criteria:

Installer validates and injects user provided endpoint overrides into cluster deployment process and the storage components use specified endpoints and start up properly.

https://github.com/openshift/ibm-vpc-block-csi-driver-operator/pull/89

Story CORS-3020: IBM Terraform Provider release bump to pick up key fixes needed in 4.15

View the Description View the linked PRs

We have a blocking issue that is fixed and included in IBM TF Provider release 1.60.0. We need this pulled into OCP 4.15.

IBM PowerVS has been notified.

https://github.com/openshift/installer/pull/7784

Story CORS-2933: Add IBM Cloud service endpoint override support (Installer)

View the Description View the linked PRs

User Story:

Description:

IBM Cloud VPC (x86_64) currently does not support Disconnected cluster installation via IPI.

In order to add this support, the override of certain IBM Cloud Services (e.g., IAM, IaaS), must be configurable and made available in the cluster infrastructure resource.

Implementation is dependent on API changes merging first (https://issues.redhat.com/browse/SPLAT-1097)

Acceptance Criteria:

Installer validates and injects user provided endpoint overrides into cluster deployment process.

https://github.com/openshift/installer/pull/7632

Story NE-1402: Add IBM Cloud service endpoint override support (Ingress Operator)

View the Description View the linked PRs

User Story:

Description:

IBM dependent components of OCP will need to add support to use a set of endpoint override values in order to reach IBM Cloud Services in Disconnected environments.

The Ingress Operator components will need to be able to allow all API calls to IBM Cloud Services, be directed to these endpoint values, in order to communicate in environments where the Public or default IBM Cloud Service endpoint is not available.

apiVersion: config.openshift.io/v1
kind: Infrastructure
metadata:
  creationTimestamp: "2023-10-04T22:02:15Z"
  generation: 1
  name: cluster
  resourceVersion: "430"
  uid: b923c3de-81fc-4a0e-9fdb-8c4c337fba08
spec:
  cloudConfig:
    key: config
    name: cloud-provider-config
  platformSpec:
    type: IBMCloud
status:
  apiServerInternalURI: https://api-int.us-east-disconnect-21.ipi-cjschaef-dns.com:6443
  apiServerURL: https://api.us-east-disconnect-21.ipi-cjschaef-dns.com:6443
  controlPlaneTopology: HighlyAvailable
  cpuPartitioning: None
  etcdDiscoveryDomain: ""
  infrastructureName: us-east-disconnect-21-gtbwd
  infrastructureTopology: HighlyAvailable
  platform: IBMCloud
  platformStatus:
    ibmcloud:
      dnsInstanceCRN: 'crn:v1:bluemix:public:dns-svcs:global:a/fa4fd9fa0695c007d1fdcb69a982868c:f00ac00e-75c2-4774-a5da-44b2183e31f7::'
      location: us-east
      providerType: VPC
      resourceGroupName: us-east-disconnect-21-gtbwd
      serviceEndpoints:
      - name: iam
        url: https://private.us-east.iam.cloud.ibm.com
      - name: vpc
        url: https://us-east.private.iaas.cloud.ibm.com/v1
      - name: resourcecontroller
        url: https://private.us-east.resource-controller.cloud.ibm.com
      - name: resourcemanager
        url: https://private.us-east.resource-controller.cloud.ibm.com
      - name: cis
        url: https://api.private.cis.cloud.ibm.com
      - name: dnsservices
        url: https://api.private.dns-svcs.cloud.ibm.com/v1
      - name: cis
        url: https://s3.direct.us-east.cloud-object-storage.appdomain.cloud
    type: IBMCloud

The CCM is currently relying on updates to the openshift-cloud-controller-manager/cloud-conf configmap, in order to override its required IBM Cloud Service endpoints, such as:

data:
  config: |+
    [global]
    version = 1.1.0
    [kubernetes]
    config-file = ""
    [provider]
    accountID = ...
    clusterID = temp-disconnect-7m6rw
    cluster-default-provider = g2
    region = eu-de
    g2Credentials = /etc/vpc/ibmcloud_api_key
    g2ResourceGroupName = temp-disconnect-7m6rw
    g2VpcName = temp-disconnect-7m6rw-vpc
    g2workerServiceAccountID = ...
    g2VpcSubnetNames = temp-disconnect-7m6rw-subnet-compute-eu-de-1,temp-disconnect-7m6rw-subnet-compute-eu-de-2,temp-disconnect-7m6rw-subnet-compute-eu-de-3,temp-disconnect-7m6rw-subnet-control-plane-eu-de-1,temp-disconnect-7m6rw-subnet-control-plane-eu-de-2,temp-disconnect-7m6rw-subnet-control-plane-eu-de-3
    iamEndpointOverride = https://private.iam.cloud.ibm.com
    g2EndpointOverride = https://eu-de.private.iaas.cloud.ibm.com
    rmEndpointOverride = https://private.resource-controller.cloud.ibm.com

Acceptance Criteria:

Installer validates and injects user provided endpoint overrides into cluster deployment process and the Ingress Operator components use specified endpoints and start up properly.

https://github.com/openshift/cluster-ingress-operator/pull/990

Feature OCPSTRAT-748: On Cluster Layering: Phase 2 (tech preview)

View the Description View Demos

Note: phase 2 target is tech preview.

Feature Overview

In the initial delivery of CoreOS Layering, it is required that administrators provide their own build environment to customize RHCOS images. That could be a traditional RHEL environment or potentially an enterprising administrator with some knowledge of OCP Builds could set theirs up on-cluster.

The primary virtue of an on-cluster build path is to continue using the cluster to manage the cluster. No external dependency, batteries-included.

On-cluster, automated RHCOS Layering builds are important for multiple reasons:

One-click/one-command upgrades of OCP are very popular. Many customers may want to make one or just a few customizations but also want to keep that simplified upgrade experience.
Customers who only need to customize RHCOS temporarily (hotfix, driver test package, etc) will find off-cluster builds to be too much friction for one driver.
One of OCP's virtues is that the platform and OS are developed, tested, and versioned together. Off-cluster building breaks that connection and leaves it up to the user to keep the OS up-to-date with the platform containers. We must make it easy for customers to add what they need and keep the OS image matched to the platform containers.

Goals & Requirements

The goal of this feature is primarily to bring the 4.14 progress (~~OCPSTRAT-35~~) to a Tech Preview or GA level of support.
Customers should be able to specify a Containerfile with their customizations and "forget it" as long as the automated builds succeed. If they fail, the admin should be alerted and pointed to the logs from the failed build.
- The admin should then be able to correct the build and resume the upgrade.
Intersect with the Custom Boot Images such that a required custom software component can be present on every boot of every node throughout the installation process including the bootstrap node sequence (example: out-of-box storage driver needed for root disk).
Users can return a pool to an unmodified image easily.
RHEL entitlements should be wired in or at least simple to set up (once).
Parity with current features – including the current drain/reboot suppression list, CoreOS Extensions, and config drift monitoring.

https://drive.google.com/file/d/1dFtAjrBJ7wyTxJz54elVeBxAAImtbwjp/

Epic MCO-665: On-Cluster Layering Tech Preview

View the Description

This work describes the tech preview state of On Cluster Builds. Major interfaces should be agreed upon at the end of this state.

Story MCO-746: BuildController unit test suite is flaky

View the Description View the linked PRs

In their current state, the BuildController unit test suite sometimes fails unexpectedly. This causes loss of confidence in the MCO unit test suite and can block PRs from merging; even when the changes the PR introduces are unrelated to BuildController. I suspect there is a race condition within the test suite, which combined with the test suite itself being aggressively parallel, causes the test suite to fail unexpectedly.

Done When:

The BuildController unit test suite has been audited and refactored to remove this race condition.

https://github.com/openshift/machine-config-operator/pull/3905

Bug OCPBUGS-18458: Ssh keys cannot be updated in OCB pools

View the Description View the linked PRs

Description of problem:

In an on-cluster build pool, when we create a MC to update the sshkeys, we can't find the new keys in the nodes after the configuration is built and applied.

Version-Release number of selected component (if applicable):

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.nightly-2023-08-30-191617   True        False         7h52m   Cluster version is 4.14.0-0.nightly-2023-08-30-191617

How reproducible:

Always

Steps to Reproduce:

1. Enable the on-cluster build functionality in the "worker" pool
2. Check the value of the current keys

$ oc debug  node/$(oc get nodes -l node-role.kubernetes.io/worker -ojsonpath="{.items[0].metadata.name}") -- chroot /host cat /home/core/.ssh/authorized_keys.d/ignition
Warning: metadata.name: this is used in the Pod's hostname, which can result in surprising behavior; a DNS label is recommended: [must be no more than 63 characters]
Starting pod/sregidor-sr3-bfxxj-worker-a-h5b5jcopenshift-qeinternal-debug-ljxgx ...
To use host binaries, run `chroot /host`
ssh-rsa AAAA..................................................................................................................................................................qe@redhat.com


Removing debug pod ...


3. Create a new MC to configure the  "core" user's sshkeys. We add 2 extra keys.


$ oc get mc -o yaml tc-59426-add-ssh-key-9tv2owyp
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  creationTimestamp: "2023-09-01T10:57:14Z"
  generation: 1
  labels:
    machineconfiguration.openshift.io/role: worker
  name: tc-59426-add-ssh-key-9tv2owyp
  resourceVersion: "135885"
  uid: 3cf31fbb-7a4e-472d-8430-0c0eb49420fc
spec:
  config:
    ignition:
      version: 3.2.0
    passwd:
      users:
      - name: core
        sshAuthorizedKeys:
        - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDPmGf/sfIYog......
          mco_test@redhat.com
        - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDf.......
          mco_test2@redhat.com

 3. Verify that the new rendered MC contains the 3 keys

$ oc get mcp worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-02d04d7c47cd3e08f8f305541cf85000   True      False      False      2              2                   2                     0                      8h

$ oc get mc -o yaml rendered-worker-02d04d7c47cd3e08f8f305541cf85000 | grep users -A9
      users:
      - name: core
        sshAuthorizedKeys:
        - ssh-rsa AAAAB...............................qe@redhat.com
        - ssh-rsa AAAAB...............................mco_test@redhat.com
        - ssh-rsa AAAAB...............................mco_test2@redhat.com
    storage:

Actual results:

Only the initial key is present in the node

$ oc debug  node/$(oc get nodes -l node-role.kubernetes.io/worker -ojsonpath="{.items[0].metadata.name}") -- chroot /host cat /home/core/.ssh/authorized_keys.d/ignition
Warning: metadata.name: this is used in the Pod's hostname, which can result in surprising behavior; a DNS label is recommended: [must be no more than 63 characters]
Starting pod/sregidor-sr3-bfxxj-worker-a-h5b5jcopenshift-qeinternal-debug-ljxgx ...
To use host binaries, run `chroot /host`
ssh-rsa AAAA.........qe@redhat.com

Removing debug pod ...

Expected results:

The added ssh keys should be configure in /home/core/.ssh/authorized_keys.d/ignition file as well.

Additional info:

https://github.com/openshift/machine-config-operator/pull/3946

Bug OCPBUGS-18414: In OCB pools, when a config drift happens and it is fixed, the pool is degraded with error: "Old and new refs are equal"

View the Description View the linked PRs

Description of problem:

In pools with On-Cluster Build enabled. When a config drift happens because a file's content has been manually changed the MCP goes degraded (this is expected).

  - lastTransitionTime: "2023-08-31T11:34:33Z"
    message: 'Node sregidor-sr2-2gb5z-worker-a-7tpjd.c.openshift-qe.internal is reporting:
      "unexpected on-disk state validating against quay.io/xxx/xxx@sha256:........................:
      content mismatch for file \"/etc/mco-test-file\""'
    reason: 1 nodes are reporting degraded status on sync
    status: "True"
    type: NodeDegraded


If we fix this drift and we restore the original file's content, the MCP becomes degraded with this message:

    - lastTransitionTime: "2023-08-31T12:24:47Z"
      message: 'Node sregidor-sr2-2gb5z-worker-a-q7wcb.c.openshift-qe.internal is
        reporting: "failed to update OS to quay.io/xxx/xxx@sha256:.......
        : error running rpm-ostree rebase --experimental ostree-unverified-registry:quay.io/xxx/xxx@sha256:........:
        error: Old and new refs are equal: ostree-unverified-registry:quay.io/xxx/xxx@sha256:..............\n:
        exit status 1"'
      reason: 1 nodes are reporting degraded status on sync
      status: "True"
      type: NodeDegraded

Version-Release number of selected component (if applicable):

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.nightly-2023-08-30-191617   True        False         4h18m   Error while reconciling 4.14.0-0.nightly-2023-08-30-191617: the cluster operator monitoring is not available

How reproducible:

Always

Steps to Reproduce:

1. Enable the OCB functionality for worker pool
$ oc label mcp/worker machineconfiguration.openshift.io/layering-enabled=

(Create the necessary cms and secrets for the OCB functionality to work fine)

wait until the new image is created and the nodes are updated

2. Create a MC to deploy a new file
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: mco-drift-test-file
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      files:
      - contents:
          source: data:,MCO%20test%20file%0A
        path: /etc/mco-test-file

wait until the new MC is deployed

3. Modify the content of the file /etc/mco-test-file making a backup first

$ oc debug  node/$(oc get nodes -l node-role.kubernetes.io/worker -ojsonpath="{.items[0].metadata.name}")
chrWarning: metadata.name: this is used in the Pod's hostname, which can result in surprising behavior; a DNS label is recommended: [must be no more than 63 characters]
Starting pod/sregidor-sr2-2gb5z-worker-a-q7wcbcopenshift-qeinternal-debug-sv85v ...
To use host binaries, run `chroot /host`
oot /host
cd /etc
Pod IP: 10.0.128.9
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-5.1# cd /etc
sh-5.1# cat mco-test-file 
MCO test file
sh-5.1# cp mco-test-file mco-test-file-back
sh-5.1# echo -n "1" >> mco-test-file


4. wait until the MCP reports the config drift issue

$ oc get mcp worker -o yaml
....
  - lastTransitionTime: "2023-08-31T11:34:33Z"
    message: 'Node sregidor-sr2-2gb5z-worker-a-7tpjd.c.openshift-qe.internal is reporting:
      "unexpected on-disk state validating against quay.io/xxx/xxx@sha256:........................:
      content mismatch for file \"/etc/mco-test-file\""'
    reason: 1 nodes are reporting degraded status on sync
    status: "True"
    type: NodeDegraded


5. Restore the backup that we made in step 3
sh-5.1# cp mco-test-file-back mco-test-file

Actual results:

The worker pool is degraded with this message

    - lastTransitionTime: "2023-08-31T12:24:47Z"
      message: 'Node sregidor-sr2-2gb5z-worker-a-q7wcb.c.openshift-qe.internal is
        reporting: "failed to update OS to quay.io/xxx/xxx@sha256:.......
        : error running rpm-ostree rebase --experimental ostree-unverified-registry:quay.io/xxx/xxx@sha256:........:
        error: Old and new refs are equal: ostree-unverified-registry:quay.io/xxx/xxx@sha256:..............\n:
        exit status 1"'
      reason: 1 nodes are reporting degraded status on sync
      status: "True"
      type: NodeDegraded

Expected results:

The node pool should stop being degraded.

Additional info:

There is a link to the must-gather file in the first comment of this issue.

https://github.com/openshift/machine-config-operator/pull/3946

Bug OCPBUGS-18456: Passwords cannot be configured in OCB pools

View the Description View the linked PRs

Description of problem:

In OCB pools, when we create a MC to configure a password for the "core" user the password is not configured.

Version-Release number of selected component (if applicable):

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.nightly-2023-08-30-191617   True        False         5h38m   Cluster version is 4.14.0-0.nightly-2023-08-30-191617

How reproducible:

Alwasy

Steps to Reproduce:

1. Enable on-cluster build on "worker" pool.
2. Create a MC to configure the "core" user password

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  creationTimestamp: "2023-09-01T09:51:14Z"
  generation: 1
  labels:
    machineconfiguration.openshift.io/role: worker
  name: tc-59417-test-core-passwd-tx2ndvcd
  resourceVersion: "105610"
  uid: 1f7a4de1-6222-4153-a46c-d1a17e5f89b1
spec:
  config:
    ignition:
      version: 3.2.0
    passwd:
      users:
      - name: core
        passwordHash: $6$uim4LuKWqiko1l5K$QJUwg.4lAyU4egsM7FNaNlSbuI6JfQCRufb99QuF082BpbqFoHP3WsWdZ5jCypS0veXWN1HDqO.bxUpE9aWYI1   # password coretest



3. Wait for the configuration to be built and applied

Actual results:

The password is not configured for the core user

In a worker node:

We can't login using the new password

$ oc debug node/sregidor-sr3-bfxxj-worker-a-h5b5j.c.openshift-qe.internal
Warning: metadata.name: this is used in the Pod's hostname, which can result in surprising behavior; a DNS label is recommended: [must be no more than 63 characters]
Starting pod/sregidor-sr3-bfxxj-worker-a-h5b5jcopenshift-qeinternal-debug-cb2gh ...
To use host binaries, run `chroot /host`
chPod IP: 10.0.128.2
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-5.1# su core
[core@sregidor-sr3-bfxxj-worker-a-h5b5j /]$ su core
Password: 
su: Authentication failure


The password is not configured:

sh-5.1# cat /etc/shadow |grep core
systemd-coredump:!!:::::::
core:*:19597:0:99999:7:::

Expected results:

The password should be configured and we should be able to login to the nodes using the user "core" and the configured password.

Additional info:

https://github.com/openshift/machine-config-operator/pull/3946

Feature OCPSTRAT-757: Add support for Tel Aviv region in AWS

View the Description

Feature Overview (aka. Goal Summary)

Add support for il-central-1 in AWS

Goals (aka. expected user outcomes)

As a user I'm able to deploy OpenShift in il-central-1 in AWS and this region is fully supported

Requirements (aka. Acceptance Criteria):

A user can deploy OpenShift in AWS il-central-1 using all the supported installation tools for self-managed customers.

The support of this region is backported to the previous OpenShift EUS release.

The corresponding RHCOS image needs to be available in the new region so the Installer can list the region.

Background

AWS has added support for a new region in their public cloud offering and this region needs to be supported for OpenShift deployments as other regions.

Documentation Considerations

The information about the new region needs to be added to the documentation so this is supported.

Epic CORS-2770: Add support for il-central-1 in AWS

View the Description

Epic Goal

Validate the Installer can deploy OpenShift on il-central-1 in AWS
Create the corresponding OCPBUGs if there is any issue while validating this

Story CORS-2985: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/image-registry/pull/386

Story CORS-2975: Add support for il-central-1 in AWS

View the Description View the linked PRs

User Story:

As a (user persona), I want to be able to:

Capability 1
Capability 2
Capability 3

so that I can achieve

Outcome 1
Outcome 2
Outcome 3

Acceptance Criteria:

Description of criteria:

Upstream documentation
Point 1
Point 2
Point 3

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

(optional) https://github/com/link.to.enhancement/
(optional) https://issues.redhat.com/link.to.spike
Engineering detail 1
Engineering detail 2

This requires/does not require a design proposal.
This requires/does not require a feature gate.

https://github.com/openshift/installer/pull/7740

Feature OCPSTRAT-765: [Dev Preview] OC mirror multiple internet disconnected enclaves - MVP

View the Description

MVP aims at refactoring MirrorToDisk and DiskToMirror for OCP releases

Execute command line
Copy & untar release-index
Inspect untarred folder
gather release images from disk
Generate artifacts (icsp)
bulk pull-push payload images
gather release-index
unit test & e2e

Epic CLID-25: [Dev Preview] Enclave support for oc-mirror.

View the Description

As an MVP, this epic covers the work for ~~RFE-3800~~ (includes ~~RFE-3393~~ and ~~RFE-3733~~) for mirroring releases.

The full description / overview of the enclave support is best described here

The design document can be found here

Upcoming epics, such as ~~CFE-942~~ will complete the RFE work with mirroring operators, additionalImages, etc.

Architecture Overview (diagram)

Story CFE-977: As a oc-mirror user, I want the tar generated by mirror to disk process to be as small as possible

View the Description View the linked PRs

As a oc-mirror user, I want the tar generated by mirror to disk process to be as small as possible so that its transfer to the enclaves is as quick as possible.

Background:

After the first demo done by the team, the initial solution that consisted of archiving the whole cache and sending it through the one-way diode to the enclaves was refused by the stakeholders.

~~CFE-966~~ studied a solution to include in the tar:

only new images
only new blobs and manifests

This story is about implementing the studied solution.

Acceptance criteria:

Scenario 1: mirroring additional images - No previous history
- Create a first mirroring with additionalImage:
  quay.io/okd/scos-release:4.14.0-0.okd-scos-2023-10-19-123226
- MirrorToDisk should be successful
- The tar.gz should be available under working-dir
- the DiskToMirror based on the above generated tar.gz should be successful
- The destination registry should have this image, and I should be able to inspect it using skopeo inspect
Scenario 2: mirroring additional images - with previous history
- Create a second mirroring with additionalImage:
  quay.io/okd/scos-release:4.14.0-0.okd-scos-2023-10-19-111256
  - This can be with the same imageSetConfig of scenario1, to which we append this image
- MirrorToDisk should be successful
- The tar.gz should be available under working-dir
  - The tar.gz should not contain blobs 97da74cc6d8fa5d1634eb1760fd1da5c6048619c264c23e62d75f3bf6b8ef5c4 nor d8190195889efb5333eeec18af9b6c82313edd4db62989bd3a357caca4f13f0e (previously mirrored)
- The size of the tar.gz should be smaller than the first tar.gz
- the DiskToMirror based on the above generated tar.gz should be successful
- The destination registry should have this new image, and I should be able to inspect it using skopeo inspect it by tag

Sub-task CFE-991: Integrate all the building blocks together MirrorToDisk

View the Description View the linked PRs

Stop registry
call blobGatherer on all collected images
call archiver to generate the archive

https://github.com/openshift/oc-mirror/pull/739

Sub-task CFE-980: tar packer

View the Description View the linked PRs

I need an implementation (and interface) that constructs a tar.gz from:

the contents of working-dir (excluding existing tars)
the repositories folder from the cache's localstorage
all blobs listed in the list that the function gets from the input
the imagesetconfig

Tar contents:

docker/v2/repositories : manifests for all mirrored images
docker/v2/blobs/sha256 : blobs that haven't been mirrored (diff)
working-dir
image set config

Diff logic:

https://github.com/openshift/oc-mirror/pull/734

Sub-task CFE-981: tar unpacker

View the Description View the linked PRs

I need an implementation (and interface) that can be used at the beginning of the diskToMirror process (!!!! before starting the local cache) and that would extract each part of the tar.gz to its location:

docker/v2/repositories => local registry cache storage
docker/v2/blobs/sha256 => local registry cache storage
working-dir => working-dir folder
hidden history file => under working-dir
image set config => under wokring-dir

https://github.com/openshift/oc-mirror/pull/738

Sub-task CFE-994: cache location separate of working-dir

View the Description View the linked PRs

Definitions:

cache-dir: path to the folder containing docker/v2/repositories.. and that can be used by the local cache (local registry) of oc-mirror
working-dir: path to the folder containing custom resources (IDMS, ITMS, catalogSource, etc), history metadata, archives, some catalog and release manifests .

Requirement
AS the admin of several clusters of my company, with several enclaves involved,
WHEN doing MirrorToDisk for several enclaves
I would like to be able to reuse the cache-dir of my main environement (entreprise level) for all enclaves,
AND use a separate working-dir for each enclave
SO THAT I can gain in storage volume and in performance, while preserving a separate context for each enclave

https://github.com/openshift/oc-mirror/pull/741

Story CFE-899: As a user I want to be able to use oc-mirror in both partial and fully disconnected install scenarios so that I can generate relevant metadata useful for validation and implementation for release images according to the requirements for enclave support

View the Description View the linked PRs

Acceptance Criteria
- All tests pass (80%) coverage
- Documentation approved by docs team
- All tests QE approved
- Well documented README/HOWTO/OpenShift documentation

Tasks
- Implement V2 versioning as discussed in the EP document
- Implement code to generate artifacts IDMS ITMS and audit data
- Implement unit tests

https://github.com/openshift/oc-mirror/pull/718

Sub-task CFE-958: removal of registry domain name from the mirroring

View the Description View the linked PRs

In order to keep the same behavior as v1, we need to have the mirroring at the namespace level

https://github.com/openshift/oc-mirror/pull/718

Story CFE-897: As a user I want to be able to use oc-mirror cli in both partial and fully disconnected install scenarios so that I can mirror all images according to the requirements for enclave support

View the Description View the linked PRs

Acceptance Criteria
- All unit tests pass (80% coverage)
- Documentation approved by docs team
- All tests QE approved
- Well documented README/HOWTO/OpenShift documentation

Tasks
- Implement V2 versioning as discussed in the EP document
- Update cli to switch to v2 and use new cli parameters
- Hide the logs of the local storage registry in a different file
- Add a flag to the CLI to choose the port for the local storage registry
- Implement code for new cli (Executor) and relevant flags
- Implement unit tests

https://github.com/openshift/oc-mirror/pull/692

Sub-task CFE-906: As a user viewing oc-mirror logs, I don't want to be submerged by the logs generated by the local storage

View the Description View the linked PRs

logs from the local storage registry of oc-mirror need to be redirected in another log file

https://github.com/openshift/oc-mirror/pull/686

Sub-task CFE-907: As a user of oc-mirror, I want to be able to switch to using the enclave compatible tech preview

View the Description View the linked PRs

oc-mirror v1 and v2 can cohabitate.
by default, v1 code is called.
oc-mirror switches to the use of the v2 when a certain flag is added

https://github.com/openshift/oc-mirror/pull/692

Story CFE-971: As a oc-mirror user, I'd like oc-mirror to create and mirror a graph image for ocp releases

View the Description View the linked PRs

When graph: true is specified for releases in the imageSetConfig, I'd like oc-mirror to create and mirror a graph image for ocp releases

https://github.com/openshift/oc-mirror/pull/722

Story CFE-965: Enable signature verification

View the Description View the linked PRs

Check how to enable signature verification using the skopeo mod (copy method)

https://github.com/openshift/oc-mirror/pull/709

Story CFE-900: As a user I want to be able to use oc-mirror in both partial and fully disconnected install scenarios so that I can bulk copy/mirror release images according to the requirements for enclave support

View the Description

Acceptance Criteria
- All tests pass (80%) coverage
- Documentation approved by docs team
- All tests QE approved
- Well documented README/HOWTO/OpenShift documentation

Tasks
- Implement V2 versioning as discussed in the EP document
- Implement code to bulk (use concurrency) push-pull release images
- Implement unit tests

Sub-task CFE-956: as a oc-mirror user, I should be able to pull mirrored images by digest

View the linked PRs

https://github.com/openshift/oc-mirror/pull/695

Sub-task CFE-935: oc-mirror uses the local storage registry in order to facilitate mirroring releases

View the linked PRs

https://github.com/openshift/oc-mirror/pull/690

Feature OCPSTRAT-767: Add support to NAT Gateway as outboundType for clusters in Azure (GA)

View the Description

Feature Overview (aka. Goal Summary)

Add support of NAT Gateways in Azure while deploying OpenShift on this cloud to manage the outbound network traffic and make this the default option for new deployments

Goals (aka. expected user outcomes)

While deploying OpenShift on Azure the Installer will configure NAT Gateways as the default method to handle the outbound network traffic so we can prevent existing issues on SNAT Port Exhaustion issue related to the configured outboundType by default.

Requirements (aka. Acceptance Criteria):

The installer will use the NAT Gateway object from Azure to manage the outbound traffic from OpenShift.

The installer will create a NAT Gateway object per AZ in Azure so the solution is HA.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Background

Using NAT Gateway for egress traffic is the recommended approach from Microsoft

This is also a common ask from different enterprise customers as with the actual solution used by OpenShift for outbound traffic management in Azure they are hitting SNAT Port Exhaustion issues.

Interoperability Considerations

Epic OCPCLOUD-2131: Add support to multiple subnets in Control Plane Machine Sets

View the Description

Epic Goal

Allow Control Plane Machine Sets to specify multiple Subnets in Azure to support NAT Gateways for egress traffic

Why is this important?

In order to avoid the SNAT port exhaustion issues in Azure, Microsoft recommends to use NAT Gateways for outbound traffic management. As part of the NAT Gateway support enablement the CPMS objects need to be able to support multiple subnets

Scenarios

One Nat Gateway per Availability Zone
One Subnet per Availability Zone
Multiple Subnets in multiple Availability Zones

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.

Dependencies (internal and external)

This work depends on the work done in ~~CORS-2564~~

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story OCPCLOUD-2130: Azure: Allow multiple subnets in Control Plane Machine Sets

View the Description View the linked PRs

User Story:

As a user, I want to be able to:

Allow Control Plane Machine Sets to specify multiple Subnets

so that I can achieve

One Nat Gateway per Availability Zone
One Subnet per Availability Zone
Multiple Subnets in multiple Availability Zones

Acceptance Criteria:

The ability to specify multiple Subnets

This requires/does not require a design proposal.
This requires/does not require a feature gate.

https://github.com/openshift/installer/pull/7617

Feature OCPSTRAT-791: Prevent must-gather from filling up master node

View the Description

Feature Overview (aka. Goal Summary)

As an openshift admin i want to prevent must-gather to fill the disk space as must-gather runs in master node so that if it fills up disk then it can cause problem in master node thus can affecting the stability of the my OCP env

Goals (aka. expected user outcomes)

The observable functionality that the user now has as a result of receiving this feature. Complete during New status.

Epic WRKLDS-859: Define configurable default limit to emptydir volume in must gather pod

View the Description

Define configurable default limit to emptydir volume in must gather pod

Story WRKLDS-860: Define configurable default limit to emptydir volume in must gather pod

View the linked PRs

https://github.com/openshift/oc/pull/1533

Feature OCPSTRAT-798: New Infrastructure Capabilities annotations

View the Description

Feature Overview (aka. Goal Summary)

Customers can trust the metadata in our operators catalogs to reason about infrastructure compatibility and interoperability. Similar to OCPPLAN-7983 the requirement is that this data is present for every layered product and Red Hat-release operator and ideally also ISV operators.

Today it is hard to validate the presence of this data due to the metadata format. This features tracks introducing a new format, implementing the appropriate validation and enforcement of presence as well as defining a grace period in which both formats are acceptable.

Goals (aka. expected user outcomes)

Customers can rely on the operator metadata as the single source of truth for capability and interoperability information instead of having to look up product-specific documentation. They can use this data to filter in on-cluster and public catalog displays as well as in their pipelines or custom workflows.

Red Hat Operators are required to provide this data and we aim for near 100% coverage in our catalogs.

Absence of this data can reliably be detected and will subsequently lead to gating in the release process.

Requirements (aka. Acceptance Criteria):

discrete annotations per feature that can be checked for presence as well as positive and negative values (see PORTEANBLE-525)
support in the OCP console and RHEC to support both the new and the older metadata annotations format
enforcement in ISV and RHT operator release pipelines
- first with non-fatal warnings
- later with blocking behavior if annotations are missing
- the presence of ALL annotations needs to be checked in all pipelines / catalogs

Questions to Answer:

when can we rollout the pipeline tests?
- only when there is support for visualization in the OCP Console and catalog.redhat.com
should operator authors use both, old and new annotations at the same time?
- they can, but there is no requirement to do that, once the support in console and RHEC is there, the pipelines will only check for the new annotations
what happens to older OCP releases that don't support the new annotations yet?
- the only piece in OCP that is aware of the annotations is the console, and we plan to backport the changes all the way to 4.10

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

we first need internal documentation for RHT and ISV teams that need to implement the change
when RHEC and Console are ready, we will update the external documentation and and can point to that as the official source of truth

Interoperability Considerations

OCP Console will have to support the new format (see ~~CONSOLE-3688~~) in parallel to the old format (as fallback) in all currently supported OCP versions

Epic CONSOLE-3688: Support for new operator annotations format

View the Description

Epic Goal

Transparently support old and new infrastructure annotations format delivered by OLM-packaged operators

Why is this important?

As part of part of OCPSTRAT-288 we are looking to improve the metadata quality of Red Hat operators in OpenShift
via ~~PORTENABLE-525~~ we are defining a new metadata format that supports the aforementioned initiative with more robust detection of individual infrastructure features via boolean data types

Scenarios

A user can use the OCP console to browse through the OperatorHub catalog and filter for all the existing and new annotations defined in ~~PORTENABLE-525~~
A user reviewing an operator's detail can see the supported infrastructures transparently regardless if the operator uses the new or the existing annotations format

Acceptance Criteria

the new annotation format is supported in operatorhub filtering and operator details pages
the old annotation format keeps being supported in operatorhub filtering and operator details pages
the console will respect both the old and the new annotations format
when for a particular feature both the operator denotes data in both the old and new annotation format, the annotations in the newer format take precedence
the newer infrastructure features from ~~PORTENABLE-525~~ tls-profiles and token-auth/* do not have equivalents in the old annotation format and evaluation doesn't need to fall back as described in the previous point

Dependencies (internal and external)

none

Open Questions

due to the non-intrusive nature of this feature, can we ship it in a 4.14.z patch release?

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Bug OCPBUGS-21881: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Feature OCPSTRAT-823: MCO skips reboot when config matches during bootstrap pivot

View the Description

Feature Overview (aka. Goal Summary)

In the MCO today, we always reboot during initial node bootstrap pivot. This is because machine-config CRD also manages non-ignition field like OSImageURL, kernelArguments, extensions, etc. Any update in these fields would require node reboot.
Most of the time cluster OSImageURL is different than boot image and hence it results in node reboot.

However, there are certain use cases (like bare metal) where we would like to skip this reboot to bring up node faster. The cluster admin would boot the node with a boot image matching matching with cluster OSImageURL but it will still lead to reboot.

Epic MCO-706: MCO skips reboot when configuration matches during node bootstrap pivot

View the Description

In the MCO today, we always reboot during initial node bootstrap pivot. This is because machine-config crd also manages non-ignition field like OSImageURL, kernelArguments, extensions, etc. Any update in thse fields would require node reboot.
Most of the time cluster OSImageURL is different than bootimage and hence it results in node reboot.

However, there are certain usecases (like baremetal) where we would like to skip this reboot to bring up node faster. The cluster admin would boot the node with bootimage matching matching with cluster OSImageURL but it will still lead to reboot.

For this effort, two area has been identified where it is possible that MCO can be improved to skip reboot during initial pivot:

Bootimage matches cluster OSImageURL
kernelArgs supplied to MachineConfig can be applied during node provisioning

Note: With additional findings from Assisted installer team, scope of work has been re-framed to meet the requirement of assisted installer workflow.

Story MCO-731: Retry ovs-configuration.service if file /etc/ignition-machine-config-encapsulated.json exists

View the Description View the linked PRs

The systemd service ovs-configuration.service is skipped if the file /etc/ignition-machine-config-encapsulated.json exists. The reason is that there is an assumption that reboot will be done if the file exists.
When we want to skip reboot, we need to verify that the service is not skipped. Therefore, the service will retry to configure until the file does not exist.

https://github.com/openshift/machine-config-operator/pull/3858

Story MCO-708: kernelArgs supplied to MachineConfig can be applied during node provisioning

View the Description View the linked PRs

After https://github.com/openshift/machine-config-operator/pull/3814 merges, it will be possible to use kernelArgs functionality that has been introduce in ignition. We can use this to sync-up kernelArgs supplied through MachineConfig to ignition field. As a result, MachineConfig supplied kargs can be available when node boots up and we don't need require a reboot.

Acceptance Criteria:

KerneArgs supplied through MachineConfig are applied without reboot during node bootstrap
MCD reconciles successfully with this change

https://github.com/openshift/machine-config-operator/pull/3856

Story MCO-707: skip reboot when bootimage matches cluster OSImageURL

View the Description View the linked PRs

PR After PR https://github.com/openshift/os/pull/657 lands in, RHCOS nodes booting from a bootimage will have digested pull spec which we available in filed `container-image-reference-digest` via rpm-ostree status.

With this, we can teach MCD to look for container-image-reference-digest for comparison when OSImageURL is not available (this is the case when node boots from a bootimage). When it matches, we can say that both bootimage and OCP cluster has same OS Content and we can safely skip the node reboot during initial pivot.

Note: Scope of this work has been reduced to PR https://github.com/openshift/machine-config-operator/pull/3857 as this is sufficient for Assisted installer use case today.

https://github.com/openshift/machine-config-operator/pull/3857

Feature OCPSTRAT-834: Agent Installer parity with multi payload

View the Description

Feature Overview (aka. Goal Summary)

A SRE/Cluster Admin will be able to use the multi payload in the same way as a single arch payload for a single arch cluster when installing with agent installer. This is most useful when using the agent installer in disconnected environments.

Goals (aka. expected user outcomes)

The agent installer will work for installs involving the multi-arch payload

Requirements (aka. Acceptance Criteria):

The agent installer will work for installs involving the multi-arch payload

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

You will not be able to install multi-architecture clusters - nodes of a different architecture will need to be added as a day 2 operation

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs. Initial completion during Refinement status.

Interoperability Considerations

Epic MULTIARCH-4258: Agent Installer parity with multi payload

View the Description

As a SRE/Cluster Admin I expect that the multi payload can be used the same way as a single arch payload for a single arch cluster when installing with agent installer

AC:

Create arm64 iso from an x86 client (and vice versa)

non-goal:

Create a cluster with mixed-arch nodes at install time (ie..x86 control plane w/arm nodes)

Story MIXEDARCH-310: Enable the use of the multi payload for agent installer

View the Description View the linked PRs

The current agent code uses `oc` to extract files from the release payload. Need to add `--filter-by-os` to the appropriate `oc` templates to ensure a user can create an agent.<arch>.iso with the multi payload on any supported client arch.

https://github.com/openshift/installer/pull/7595

Feature OCPSTRAT-842: Improve console UI experience for creating Serverless Functions

View the Description

Feature Overview (aka. Goal Summary)

Provide a unified view of all options to create Serverless Functions so developers can get quickly started with their preferred option.

Goals (aka. expected user outcomes)

A single UI panel in ODC that shows in-product and off-product choices to create Serverless functions, and provides an in-product creation experience for applicable choices.

Requirements (aka. Acceptance Criteria):

A card on Add UI panel to create Serverless Functions
A single UI panel linked to that card that shows list of options
An in-product UI experience for "Import from Git" and "Run container image" choices.
An in-product experience for CLI with Web Terminal
Static pages with directions for off-product choice of IDE extension
Quick Starts for Web Terminal

Use Cases (Optional):

Questions to Answer (Optional):

Out of Scope

An inline editor is out of scope.

Background

Customer Considerations

Documentation Considerations

Interoperability Considerations

Epic ODC-7370: DevConsole Experience for Serverless Functions

View the Description

Background: We need to provide the dev console experience for Serverless Function Experience.

Miro Board for inspiration and vision:

https://miro.com/app/board/uXjVPNtkJCI=/

Acceptance Criteria is in the parent feature.

Story ODC-7399: Create getting started content in functions list page

View the Description View the linked PRs

Description

Create getting started content in functions list page and add CLI, IDE extensions, samples links to it (Design is yet to be provided, so start with exploration and with dummy data)

Acceptance Criteria

Remove IDE extension cards in Create serverless function form
Add getting started content component of top in Functions list page
Add create button in functions list page which will take to existing Create serverless function from import from git page
Add 3 links in getting started content for CLI, IDE extensions, samples
Update existing e2e tests and add new test

Additional Details:

Design is yet to be provided, so start with exploration and with dummy data

https://github.com/openshift/console/pull/13289

Story ODC-7377: Add Functions tab to the side navigation menu in Developer prespective

View the Description View the linked PRs

Description

In the developer perspective, in the left side navigation menu, add Functions tab inside Resources section, which will list down all the serverless functions created for the specific namepsace and on click of function, open the Service details tab

Acceptance Criteria

Add Functions tab in left navigation menu below inside Resources section
when user navigates to functions page, list all the serverless functions associated with that namespace
When user clicks on function name, user should see Details tab and YAML tab as presently available in service details page
In details page, add function URL which is already added in list page and add container section before Conditions(as present in Deployment details page)
Add e2e test

Additional Details:

https://github.com/openshift/console/pull/13174

Story ODC-7425: Add serverless function icon in Add page group header

View the Description View the linked PRs

AC: Add serverless function icon in Add page group header(Attached image for reference)

Slack thread - https://redhat-internal.slack.com/archives/C05MDC1T35J/p1700149065933539

https://github.com/openshift/console/pull/13338

Story ODC-7378: Add Revisions, Routes and Pods tab for service details page

View the Description View the linked PRs

Description

When navigated to Functions page and click on any function, along with existing Details and YAML tab, we should add Revisions, Routes and Pods which are associated to that function.

Acceptance Criteria

Add Revisions tab which list the Revisions associated with the function
Add Routes tab which list the Routes associated with the function
Add Pods tab which list the Pods associated with the function
Verify breadcrumbs, resource links and all the links present in the list pages for its working
Add e2e tests

Additional Details:

https://github.com/openshift/console/pull/13174

Story ODC-7428: Update quick start name and document links for getting started section

View the Description View the linked PRs

Update quick start name and document links for getting started section based on information provided in document https://docs.google.com/document/d/1xy9GwGR5m4p9W_RJ8Wt_164LpCP57x-lNJlObGfBJ9M/edit#heading=h.beapmz2o0lv7

https://github.com/openshift/console/pull/13348

Bug OCPBUGS-23559: Styling issue in functions list page after PatternFly upgrade

View the Description View the linked PRs

Description of problem:

    Styling issue in functions list page after PatternFly upgrade

Version-Release number of selected component (if applicable):

    4.15.0

How reproducible:

    Always

Steps to Reproduce:

    1.Install serverless operator
    2.Create knative-serving instance
    3.Go to Functions menu in Dev perspective

Actual results:

    https://github.com/openshift/console/pull/13348#issuecomment-1822469023

Expected results:

Styling should be proper

Additional info:

https://github.com/openshift/console/pull/13356

Feature OCPSTRAT-843: Provide better insights into OpenShift Pipeline runs

View the Description

Feature Overview (aka. Goal Summary)

Provide better insights into performance and frequency of OpenShift pipeline runs

Goals (aka. expected user outcomes)

Show historical and real-time pipeline run data in a unified UI panel, with drill down capabilities.

Requirements (aka. Acceptance Criteria):

Provide a visual dashboard that is competitive with leading pipeline solutions

Provide access to logs of running and historical pipeline runs

Enable in-context links to manage pipeline definitions.

Apply RBAC policies to data access.

Use Cases (Optional):

Questions to Answer (Optional):

Out of Scope

Background

Customer Considerations

Documentation Considerations

Interoperability Considerations

Epic ODC-7348: Pipeline Dashboard

View the Description

Outcomes:

Dashboard is available in Admin perspective in OpenShift Console

Epic goals:

Provide a Prow dashboard similar to this: https://prow.k8s.io/

Additionally, the goal is that this work will be the beginning of the dynamic plugin for OpenShift Pipelines, which would be installed by the OpenShift Pipelines operator.

Why is it important?

Customers want to have an admin-centric view of all PLR across namespaces.
Customers want to see historical data on past PLR and get metrics on success/failure rate, time taken, etc.
Currently OpenShift console as well as the Tekton dashboard does not provide them a way to correlate PR/MRs with PLR
A single place to view the parameters for a PLR, artifacts, PLR YAML is desired.
Customers want to achieve industry-standard DevOps metrics (DORA) from historical PLR on a repo/namespace
Investment in Tekton Dashboard from the community will not be at par with the customer requirements coming in.

Acceptance criteria:

Dashboard is available in Admin perspective in OpenShift Console

Dependencies (External/Internal):

Designs

From ~~DTUX-1514~~: https://www.figma.com/file/RuEcz3C1AywqhVvKHEb16y/Pipelines-Overview-Dashboard?type=design&node-id=0-2&mode=design&t=FSfeZYBYDBIvLsbi-0

Story ODC-7396: Contribute Pipeline metrics tab using the dynamic plugin

View the Description View the linked PRs

Description

As a user, I want to contribute Pipeline metrics page using the dynamic plugin.

Acceptance Criteria

Add metrics tab to Pipeline Details page using Dynamic plugin

Additional Details:

https://github.com/openshift/console/pull/13225

Epic ODC-7347: Support to load PipelineRuns and Logs also from Tekton Results

View the Description

Outcomes:

Pipeline history and logs are available in dashboard for an extended time window without requiring PipelineRun CRs on the cluster

Epic goals

Extend availability of PipelineRun history and logs beyond availability of respective custom resources on the cluster

Why is it important?

Customers require access to pipeline history and logs for extended periods of time due to audit and troubleshooting requirements. Access to pipeline history is currently limited to the custom resources that are available on the cluster which creates an enormous burden on etcs, storage and OpenShift api server overall leading to reduced performance of the cluster.

Acceptance criteria

Pipeline history in the dashboard displays all pipelinerun available Tekton Results data
When user clicks on a PipelineRun, the details and logs are displayed based on the data from Tekton Results

Story ODC-7383: Use Tekton Results API along with k8s API to get TaskRuns list

View the Description View the linked PRs

Description

As a user, I want to see the TaskRuns from the Tekton Results data source.

Acceptance Criteria

In the admin perspective update the TaskRuns List page to use the Tekton Results API and k8s API to get the PLRs.
In the dev and admin perspective update the TaskRuns List in the PipelineRun details page to use Tekton Results API and k8s API to get the PLRs.
All the features on the TaskRuns list page should work. eg: column sort, and all the options of the kebab menu

Additional Details:

Doc to install the Results on cluster https://docs.openshift-pipelines.org/operator/install-result.html

Tekton Results API swagger https://petstore.swagger.io/?url=https://raw.githubusercontent.com/avinal/tektoncd-results/openapi-fixes/docs/api/openapi.yaml

https://github.com/openshift/console/pull/13328

Story ODC-7395: Create ListPage component which list data from k8s as well Tekton results API

View the Description View the linked PRs

Description

As a user, I want to see data from the k8s API and Tekton results API in the same list page

Acceptance Criteria

Create ListPage component
It should list the data from k8s API and other APIs like Tekton Results

Additional Details:

Check with the console team if we have any plan to move ListPage component to dynamic plugin sdk.

https://github.com/openshift/console/pull/13311

Story ODC-7426: Deletion of pipeline run after tekton result installation should be user-friendly

View the Description View the linked PRs

Description of problem:

If you delete a pipelinerun then it will be deleted in the k8s cluster but the data will be available in the tekton-results database, so the pipeline run list view will show the deleted pipelineruns

Prerequisites (if any, like setup, operators/versions):

Openshift Pipelines operator should be installed
tektonResults should be installed and working in the cluster(Follow this https://gist.github.com/vikram-raj/257d672a38eb2159b0368eaed8f8970a)

Steps to Reproduce

Create pipeline runs
Delete a pipeline run

Actual results:

It endlessly shows pipeline run and mentions `Resource is being deleted` in kebab menu

Expected results:

Gracefully handle it like adding a hint to the user that the Delete action will only delete it from the etcd storage and disable the delete action for results-based PLRS

Reproducibility (Always/Intermittent/Only Once):

Always

Build Details:

Workaround:

Additional info:

Slack thread: https://redhat-internal.slack.com/archives/CHG0KRB7G/p1700140684929449

https://github.com/openshift/console/pull/13353

Story ODC-7384: Update the PipelineRun details page to use Tekton Results API to load all the info

View the Description View the linked PRs

Description

As a user, I want to see the info on the details page from where PipelineRun is loaded

Acceptance Criteria

Update the details page of the PipelineRun to use Tekton Result / k8s API from where the PipelineRun is loaded.
Update all the tabs(logs, YAML, parameters) on the details page to use Tekton Result / k8s API from where the PipelineRun is loaded.

Additional Details:

NOTE: Events are not available in the Tekton Results API. This is only available starting with OSP 1.12+

Doc to install the Results on cluster https://docs.openshift-pipelines.org/operator/install-result.html

Tekton Results API swagger https://petstore.swagger.io/?url=https://raw.githubusercontent.com/avinal/tektoncd-results/openapi-fixes/docs/api/openapi.yaml

https://github.com/openshift/console/pull/13328

Story ODC-7391: Workaround to bypass CORS and TLS issues to fetch data from the Tekton Results API

View the Description View the linked PRs

Add a new option to the "internet proxy" to allow insecure communication that ignores CORS wit the tekton results API.

This must not be used in the final implementation! It's a workaround to start the UI developing.

https://github.com/openshift/console/pull/13175

Story ODC-7382: Use Tekton Results API along with k8s API to get PipelineRuns list

View the Description View the linked PRs

Description

As a user, I want to see the PipelineRuns from the Tekton Results data source.

Acceptance Criteria

In the dev and admin perspective update the PipelineRuns List page to use the Tekton Results API and k8s API to get the PLRs.
In the dev and admin perspective update the PipelineRuns List in the Pipeline details page to use Tekton Results API and k8s API to get the PLRs.
All the features on the list page should work. eg: column sort, and all the options of the kebab menu.

Additional Details:

Doc to install the Results on cluster https://docs.openshift-pipelines.org/operator/install-result.html

Tekton Results API swagger https://petstore.swagger.io/?url=https://raw.githubusercontent.com/avinal/tektoncd-results/openapi-fixes/docs/api/openapi.yaml

https://github.com/openshift/console/pull/13311

Story ODC-7385: Update the TaskRun details page to use Tekton Results API to load all the info

View the Description View the linked PRs

Description

As a user, I want to see the info on the details page from where TaskRun is loaded

Acceptance Criteria

Update the details page of the TaskRun to use Tekton Result / k8s API from where the TaskRun is loaded.
Update all the tabs(logs, YAML) on the details page to use Tekton Result / k8s API from where the TaskRun is loaded.

Additional Details:

NOTE: Events are not available in the Tekton Results API. This is only available starting with OSP 1.12+

Doc to install the Results on cluster https://docs.openshift-pipelines.org/operator/install-result.html

Tekton Results API swagger https://petstore.swagger.io/?url=https://raw.githubusercontent.com/avinal/tektoncd-results/openapi-fixes/docs/api/openapi.yaml

https://github.com/openshift/console/pull/13328

Feature OCPSTRAT-854: Support Heterogeneous NodePools Within HyperShift

View the Description

DoD

We need to ensure we have parity with OCP and support heterogeneous clusters

https://github.com/openshift/enhancements/pull/1014

Define UX for multi arch NodePool input. E.g always enforce multi arch image, this might not be possible because of impact for image registry on disconnected clusters https://github.com/openshift/enhancements/pull/1014#discussion_r798444099.

Goal

Provide a way to install with varied architecture NodePools with the ability to autoscale
Define UX for multi arch NodePool input. E.g always enforce multi arch image, this might not be possible because of impact for image registry on disconnected clusters https://github.com/openshift/enhancements/pull/1014#discussion_r798444099.

Why is this important?

Necessary to enable workloads with different architectures in the same Hosted Clusters.
Cost savings brought by more cost effective ARM instances

Scenarios

I have an x86 hosted cluster and I want to have at least one NodePool running ARM workloads
I have an ARM hosted cluster and I want to have at least one NodePool running x86 workloads

Acceptance Criteria

Dev - Has a valid enhancement if necessary
CI - MUST be running successfully with tests automated
QE - covered in Polarion test plan and tests implemented
Release Technical Enablement - Must have TE slides

Dependencies (internal and external)

The management cluster must use a multi architecture payload image.
The target architecture is in the OCP payload
MCE has builds for the architecture used by the worker nodes of the management cluster

Done Checklist

CI - CI is running, tests are automated and merged.
Release Technical Enablement <link to Feature Enablement Presentation>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Enhancement merged: <link to meaningful PR or GitHub Issue>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Epic HOSTEDCP-1129: Define UX and failures for HC with multi-arch NodePools.

View the Description

User Story:

Using a multi-arch Node requires the HC to be multi arch as well. This is an good to recipe to let users shoot on their foot. We need to automate the required input via CLI to multi-arch NodePool to work, e.g. on HC creation enabling a multi-arch flag which sets the right release image

Acceptance Criteria:

Description of criteria:

Upstream documentation
Point 1
Point 2
Point 3

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

(optional) https://github/com/link.to.enhancement/
(optional) https://issues.redhat.com/link.to.spike
Engineering detail 1
Engineering detail 2

This requires/does not require a design proposal.
This requires/does not require a feature gate.

Story HOSTEDCP-1113: Improve UX Around Arch Flag Validation

View the Description View the linked PRs

User Story:

As a user of HyperShift, I would like the UX around the `arch` flag validation improved so that it results in a smoother UX experience. The problem today is we default Arch to `amd64`, but then throw an invalid status message on the NodePool CRD if it's not blank and the platform is not AWS.

Acceptance Criteria:

DEFINE once path forward decided; SEE Engineering Details for more details.

Description of criteria:

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

At a minimum we should remove the empty `Arch` flag check here.

What about modifying the section to something like this:

// Validate modifying CPU arch support for platform
if (nodePool.Spec.Arch != "amd64") && (nodePool.Spec.Platform.Type != hyperv1.AWSPlatform) {
    SetStatusCondition(&nodePool.Status.Conditions, hyperv1.NodePoolCondition{
       Type:               hyperv1.NodePoolValidArchPlatform,
       Status:             corev1.ConditionFalse,
       Reason:             hyperv1.NodePoolInvalidArchPlatform,
       Message:            fmt.Sprintf("modifying CPU arch from 'amd64' not supported for platform: %s", nodePool.Spec.Platform.Type),
       ObservedGeneration: nodePool.Generation,
    })
}

This requires/does not require a design proposal.
This requires/does not require a feature gate.

https://github.com/openshift/hypershift/pull/3072

Feature OCPSTRAT-867: Update console to handle that admins can opt-out from Build v1 (BuildConfig) and DeploymentConfigs

View the Description

Feature Overview (aka. Goal Summary)

When the cluster does not have v1 builds, console needs to either provide different ways to build applications or prevent erroneous actions.

Goals (aka. expected user outcomes)

Identify the build system in place and prompt user accordingly when building applications.

Requirements (aka. Acceptance Criteria):

Console will have to hide any workflows that rely solely on buildconfigs and pipelines is not installed.

Use Cases (Optional):

As a developer, provide me with a default build option, and show options to override.
As a developer, prevent me from trying to create applications if no build option is present on the cluster.

ODC Jira - https://issues.redhat.com/browse/ODC-7352

Epic ODC-7352: Update console to handle that admins can opt-out from Build v1 (BuildConfig) and DeploymentConfigs

View the Description

Problem:

When the cluster does not have v1 builds, console needs to either provide different ways to build applications or prevent erroneous actions.

Goal:

Identify the build system in place and prompt user accordingly when building applications.

Why is it important?

Without this enhancement, users will encounter issues when trying to create applications on clusters that do not have the default s2i setup.

Use cases:

As a developer, provide me with a default build option, and show options to override.
As a developer, prevent me from trying to create applications if no build option is present on the cluster.

Acceptance criteria:

Console will have to hide any workflows that rely solely on buildconfigs and pipelines is not installed.

If we detect Shipwright, then we can call that API instead of buildconfigs. We need to understand the timelines for the latter part, and create a separate work item for it.

If both buildconfigs and Shipwright are available, then we should default to Shipwright. This will be part of the separate work item needed to support Shipwright.

Dependencies (External/Internal):

Rob Gormley to confirm timelines when customers will have to option to remove buildconfigs from their clusters. That will determine whether we take on this work in 4.15 or 4.16.

Design Artifacts:

Exploration:

Note:

Bug OCPBUGS-19311: Unhide the Import From Git Tab on the Add page if Pipelines Operator is installed and BuildConfig is not installed in the cluster

View the Description View the linked PRs

Description

As a user, I would like to use the Import from Git form even if I don't have BC installed in my cluster, but I have installed the Pipelines operator.

Acceptance Criteria

Show the Import From Git Tab on the Add page if Pipelines Operator is installed and BuildConfig is not installed in the cluster

Additional Details:

https://github.com/openshift/console/pull/13128

Bug OCPBUGS-18464: Hide the Builds NavItem if BuildConfig is not installed in the cluster

View the Description View the linked PRs

Description of problem:

Hide the Builds NavItem if BuildConfig is not installed in the cluster

https://github.com/openshift/console/pull/13141

Bug OCPBUGS-19314: Hide the DeploymentConfig option in the User Preferences

View the Description View the linked PRs

Description

As a user, I dont want to see the option of "DeploymentConfigs" in the User settings, when I have not installed the same in the cluster.

Acceptance Criteria

Hide the DeploymentConfig option as the Default Resource Type when its not installed

Additional Details:

https://github.com/openshift/console/pull/13130

Bug OCPBUGS-22419: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13241

Story ODC-7386: Ensure the Import Form gets submitted without errors if BC is not installed while Pipelines is installed

View the Description View the linked PRs

Description

As a user, I want to use the Import from Git form without any errors, to create the Pipeline for my Git Application if I have disabled Builds and installed Pipelines in the cluster.

(During the implementation, we are also trying to keep in mind the changes that have to be made later while adding SW into this form)

Acceptance Criteria

Create a Build Section.
- Create a dropdown list where the user can select the way of Building the image, i.e. BC, Shipwright (later on).
- Add Pipelines as one of the options there.
- Move the Build Configuration section from the Advanced section to this section as an Expandable.
- Also, don't show this section if BC/SW is not installed.
Create a Deploy section.
- Move the Resource Type inside this.
- Move the Deployment section from the Advanced section to this section as an Expandable.
- Also, don't show this section if BC/SW is not installed as the image used in Deploying, won't be built.
Have the Add Pipelines checkbox checked if there is no BC/SW.

Additional Details:

This is an initial prototype. This needs to be presented to the PMs for their feedback and updated accordingly.

The final UI must have the acknowledgement from the PMs and after that has to be merged.

https://github.com/openshift/console/pull/13145

Bug OCPBUGS-19313: Hide DeploymentConfig option from forms when its not installed in the cluster

View the Description View the linked PRs

Description

As a user, I dont want to see the option of "DeploymentConfigs" in any form I am filling, when I have not installed the same in the cluster.

Acceptance Criteria

Remove the DC option under the Resource Type dropdown in following forms:
- Import from Git
- Container Image
- Import JAR
- Builder Images (Developer Catalog)

Additional Details:

https://github.com/openshift/console/pull/13129

Feature OCPSTRAT-868: Update Shipwright API usage from v1alpha1 to v1beta1 when Shipwright operator goes GA

View the Description

Feature Overview (aka. Goal Summary)

Change API designator from alpha to beta for v1 Shipwright builds.

Goals (aka. expected user outcomes)

To maintain currency with API specs

Requirements (aka. Acceptance Criteria):

Update the API designator to beta
Ensure Shipwright works as normal

ODC Jira - https://issues.redhat.com/browse/ODC-7353

Epic ODC-7353: Update Shipwright API usage from v1alpha1 to v1beta1 when Shipwright operator goes GA

View the Description

Epic Goal

Change API designator from alpha to beta for v1 Shipwright builds.

Why is this important

To maintain currency with API specs

Scenarios

Acceptance Criteria (Mandatory)

Update the API designator to beta
Ensure Shipwright works as normal

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

Acceptance criteria are met
Non-functional properties of the Feature have been validated (such as performance, resource, UX, security or privacy aspects)
User Journey automation is delivered
Support and SRE teams are provided with enough skills to support the feature in production environment

Bug OCPBUGS-24367: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story ODC-7414: Update Shipwright API from v1alpha1 to v1beta1

View the Description View the linked PRs

Description

As a user, I want to use the latest version of Shipwright APIs so that I am able to use Shipwright without any issues.

Acceptance Criteria

Identify the files where the APIs need to be updated and fix them.
Fix any e2e tests that may fail.

Additional Details:

https://github.com/openshift/console/pull/13304

Feature OCPSTRAT-874: Platform external support in Agent-Based Installer (generic external platform support)

View the Description

Feature Overview

Support Platform external to allow installing with agent on partner cloud platforms following the platform external model.

For agent, this is a follow up of the platform external for OCI done for OpenShift 4.14.

Epic AGENT-704: Platform external support in Agent-Based Installer (generic external platform support)

View the Description

Feature Overview

Support Platform external to allow installing with agent on partner cloud platforms following the platform external model.

For agent, this is a follow up of the platform external for OCI done for OpenShift 4.14.

Story AGENT-729: Support generic external platform

View the Description View the linked PRs

User Story:

As a user, I want to be able to:

use external platform
and not restrict platform name to oci only

so that I can achieve

install OCP cluster on for external platform

Acceptance Criteria:

Description of criteria:

No validation to restrict platform name to be oci in case of external platforms

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

(optional) https://github/com/link.to.enhancement/
(optional) https://issues.redhat.com/link.to.spike
Engineering detail 1
Engineering detail 2

This requires/does not require a design proposal.
This requires/does not require a feature gate.

https://github.com/openshift/installer/pull/7585

Feature OCPSTRAT-875: Consolidated Post-GA OpenShift Virt Provider Enhancements

View the Description

Feature Overview (aka. Goal Summary)

Consolidated Enhancement of HyperShift/KubeVirt Provider Post GA

This feature aims to provide a comprehensive enhancement to the HyperShift/KubeVirt provider integration post its GA release.

By consolidating CSI plugin improvements, core improvements, and networking enhancements, we aim to offer a more robust, efficient, and user-friendly experience.

Goals (aka. expected user outcomes)

User Persona: Cluster service providers / SRE
Functionality:
- Expanded CSI capabilities.
- Improved core functionalities of the KubeVirt Provider
- Enhanced networking capabilities.

Epic CNV-33567: Hypershift/KubeVirt platform Smart Defaults

View the Description

Goal

Currently, the Hypershift's HostedCluster and NodePool APIs are difficult to use directly. The "hcp" cli alleviates this complexity to some degree, but comes at the cost of requiring usage of a cli tool rather than creating the resources directly.

The Goal of this epic is to reduce the complexity of the HostedCluster and NodePool APIs to the point that users only need to specify a small set of values in these apis initially at create time, then during admission have a mutating webhook fill in the remaining details using the defaults that the "hcp" cli currently uses.

Essentialy, the goal here is to move the "magic" defaulting that is so convenient to users out of the "hcp" tool and to the hypershift operator backend using a mutation webhook.

User Stories

As a user creating self-managed HCP clusters, I would like to create an HCP using only HostedCluster and NodePool resources and have the backend controllers generate the etcd and ssh key secrets on my behalf
As a user creating self-managed HCP clusters, I’d like to provide a minimal HostedCluster and NodePool spec and have the backend perform all the platform specific defaulting at creation time.

Non-Requirements

Notes

WIP design document https://docs.google.com/document/d/1lFknxPASyGGGyurjLmGvXV5JJKJ-WnOSncMm7ej-gI4/edit

Done Checklist

Who	What	Reference
DEV	Upstream roadmap issue (or individual upstream PRs)	<link to GitHub Issue>
DEV	Upstream documentation merged	<link to meaningful PR>
DEV	gap doc updated	<name sheet and cell>
DEV	Upgrade consideration	<link to upgrade-related test or design doc>
DEV	CEE/PX summary presentation	label epic with cee-training and add a <link to your support-facing preso>
QE	Test plans in Polarion	<link or reference to Polarion>
QE	Automated tests merged	<link or reference to automated tests>
DOC	Downstream documentation merged	<link to meaningful PR>

Story CNV-33847: add etcd encryption key generation to backend

View the Description View the linked PRs

Today the hcp cli tool is rendering an etcd encryption secret for each cluster that is created. We'd like for the backend to perform this logic so users can more easily use the HostedCluster API directly without needing the cli tool.

https://github.com/openshift/hypershift/pull/3148

Story CNV-35774: HCP KubeVirt platform CRD defaulting

View the Description View the linked PRs

All HC and NP API defaulting within the hcp cli that impacts the KubeVirt platform should be moved to CRD defaulting and mutating webhooks

https://github.com/openshift/hypershift/pull/3116

Epic CNV-33392: Hypershift/KubeVirt Basic Multus Integration

View the Description

Goal

The goal of this epic is to provide a solution for tying HyperShift/KubeVirt vm worker nodes into networks outside of the default pod network.

An example scenario for this Epic is a user who wishes to run their KubeVirt worker node VMs on a network they have configured within their datacenter. The user already has IPAM on their network (likely through DHCP) and wishes the KubeVirt VMs for their HCP to be tied to this externally provisioned network rather than the default pod network provided by OVNKubernetes.

What is required for us in this scenario is to provide a way to configure usage of this user provided network on the NodePool, and ensuring that the capi ecosystem components (capk, cloud-provider-kubevirt) work as expected with this VM configuration

User Stories

As a user creating and managing KubeVirt platform HCPs, I would like to control what network the KubeVirt VM worker nodes reside on.

Non-Requirements

TODO

Notes

Any additional details or decisions made/needed

Owners

Role	Contact
PM	Peter Lauterbach
Documentation Owner	TBD
Delivery Owner	(See assignee)
Quality Engineer	(See QE Assignee)

Done Checklist

Who	What	Reference
DEV	Upstream code and tests merged	https://github.com/openshift/hypershift/pull/3066
DEV	Upstream documentation merged	https://github.com/openshift/hypershift/pull/3464
DEV	gap doc updated	N/A
DEV	Upgrade consideration	None
DEV	CEE/PX summary presentation	N/A
QE	Test plans in Polarion	N/A
QE	Automated tests merged	https://github.com/openshift/hypershift/pull/3449
DOC	Downstream documentation merged	https://github.com/openshift/hypershift/pull/3464

Bug OCPBUGS-23484: activate nodeip-configuration at MCO for kubevirt platform

View the Description View the linked PRs

Description of problem:

hypershift kubevirt provider is missing the openshift mechanism to select what interface/ip address kubelet is going to use to register

The nodeip-configuration.service should activated at MCO for kubevirt platform.

Version-Release number of selected component (if applicable):

4.15.0

How reproducible:

Always

Steps to Reproduce:

Depends on hypershift kubevirt multinet feature https://github.com/openshift/hypershift/pull/3066

1. Create an openshift libvirt/baremetal cluster with metallb, cnv, odf, local-storage and kubernetes-nmstate with a pair of extra nics at nodes
2. Populate the following network attachment definition and nncps to connect those extra nics
---
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: net1
  annotations:
    k8s.v1.cni.cncf.io/resourceName: bridge.network.kubevirt.io/net1
spec:
  config: >
    {
        "cniVersion": "0.3.1",
        "name": "net1",
        "plugins": [{
            "type": "cnv-bridge",
            "bridge": "net1",
            "ipam": {}
        }]
    }
---
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: net2
  annotations:
    k8s.v1.cni.cncf.io/resourceName: bridge.network.kubevirt.io/net2
spec:
  config: >
    {
        "cniVersion": "0.3.1",
        "name": "net2",
        "plugins": [{
            "type": "cnv-bridge",
            "bridge": "net2",
            "ipam": {}
        }]
    }
---
apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: multi-net
spec:
  desiredState:
    interfaces:
    - name: net1                                                             
      type:  linux-bridge                                                           
      state: up
      ipv4:
        enabled: false
      ipv6:
        enabled: false                                                                   
      bridge: 
        options:
          stp:
            enabled: true                                                                    
        port:                                                                     
          - name: ens4
    - name: net2                                                             
      type:  linux-bridge                                                           
      state: up
      ipv4:
        enabled: false
      ipv6:
        enabled: false                                                                   
      bridge: 
        options:
          stp:
            enabled: true                                                                    
        port:                                                                     
          - name: ens5
 3. Create a kubevirt hosted cluster using those nics with the following command --additional-network=name:default/net1 --additional-network=name:default/net2 --attach-default-network=false

Actual results:

kubelet end up expose the IP from net2 but ovn-k uses net1

Expected results:

kubelet and ovn-k should use net1

Additional info:

https://github.com/openshift/machine-config-operator/pull/4039

Story HOSTEDCP-1311: Hypershift API for assigning custom network interfaces to KubeVirt NodePool

View the Description View the linked PRs

We need the ability to configure a KubeVirt platform NodePool to use a custom network interface (not the default pod network) when creating the VMs.

Since cloud-provider-kubevirt will not be able to mirror LBs when a NodePool is not on a OVNKubernetes defined network, we'll need to make sure cloud-provider-kubevirt's LB mirroring behavior is disabled when custom networks are in use.

The scenarios that we need to cover are the ones extracted from the notes doc https://docs.google.com/document/d/1zzyHxUEPyEM4hgRh_jww4gRIKJhRTxYeKeSYhxw-pDc/edit

Scenarios

Secondary network as single interface for VM

Multiple Secondary Networks as multiple interfaces for VM

Secondary network + pod network (default for kubelet) as multiple interfaces for VM

https://github.com/openshift/hypershift/pull/3066

Epic CNV-33568: Hypershift/KubeVirt WebHook Validation

View the Description

Goal

The goal of this epic is to introduce a validating webhook for the KubeVirt platform that executes the HostedCluster and NodePool validation at admission time.

One important note here is that the core Hypershift team has a requirement that all validation logic must be within the controller loop. This does not exclude the usage of a validation webhook, it merely means that if we introduce a validating webhook that it cannot replace the controller validation.

That means this task will involve abstracting our validation logic in a way that both the controller and our validating webhook share the same logic.

Design Document

https://docs.google.com/document/d/167xGJaUXWZr-foIl5ft8FNu7S0jr0CbNwRTPiEZBMmQ/edit#heading=h.trvmlnsptuct

User Stories

As a user creating HCPs with the KubeVirt platform, i want to discover validation errors at creation time rather than through a condition on the HostedCluster and NodePool objects after creation.
Non-Requirements

Notes

Any additional details or decisions made/needed

Done Checklist

Who	What	Reference
DEV	Upstream roadmap issue (or individual upstream PRs)	<link to GitHub Issue>
DEV	Upstream documentation merged	<link to meaningful PR>
DEV	gap doc updated	<name sheet and cell>
DEV	Upgrade consideration	<link to upgrade-related test or design doc>
DEV	CEE/PX summary presentation	label epic with cee-training and add a <link to your support-facing preso>
QE	Test plans in Polarion	<link or reference to Polarion>
QE	Automated tests merged	<link or reference to automated tests>
DOC	Downstream documentation merged	<link to meaningful PR>

Story CNV-34093: verify release image falls within supported release window

View the Description View the linked PRs

The hypershift operator only supports specific versions of release payloads. We'd like to give users early feedback by validating that the release payload they have picked falls within the backend operator' supported window.

The backend controller loop performs this check here. We'd like the same check to be introduced into an optional validating webhook.

https://github.com/openshift/hypershift/pull/3184

Story CNV-34094: validate infra cnv and ocp versions meet min version requirements for HCP KubeVirt VMs

View the Description View the linked PRs

The underlying infa cluster hosting HCP KubeVirt worker VMs must meet some versioning requirements (cnv >= 4.14, ocp >= 4.14). There is a validation check that enforces this on the backend today. WE'd like to move this validation to a webhook so users get early feedback if the validation will fail during Creation.

https://github.com/openshift/hypershift/pull/3132

Epic CNV-25677: Post GA: Hypershift Kubevirt Core Enhancements

View the Description

Goal

Post GA quality of life improvements for the HyperShift/KubeVirt core

User Stories

Non-Requirements

Notes

Any additional details or decisions made/needed

Done Checklist

Who	What	Reference
DEV	Upstream roadmap issue (or individual upstream PRs)	<link to GitHub Issue>
DEV	Upstream documentation merged	<link to meaningful PR>
DEV	gap doc updated	<name sheet and cell>
DEV	Upgrade consideration	<link to upgrade-related test or design doc>
DEV	CEE/PX summary presentation	label epic with cee-training and add a <link to your support-facing preso>
QE	Test plans in Polarion	<link or reference to Polarion>
QE	Automated tests merged	<link or reference to automated tests>
DOC	Downstream documentation merged	<link to meaningful PR>

Story CNV-30697: dedicated cpu/mem for KubeVirt node pools

View the Description View the linked PRs

kubevirt node pools currently only set requests for cpu/mem. This doesn't guarantee that the kubevirt VMs will have access to dedicated resources, which is something some customers may desire.

To resolve this, we should create a toggle on the nodepool under the kubevirt platform section to enable dedicated resources, which will give each VM guaranteed dedicated access to cpus and memory.

https://github.com/openshift/hypershift/pull/3048

Story CNV-29003: re-enable nodepool upgrade test (Replace only)

View the Description View the linked PRs

In PR https://github.com/openshift/hypershift/pull/2576/ we had to disable the nodepool upgrade test. This is because there are no previous releases which have the new kubevirt rhcos variant available... so there's no release to upgrade from

We need to re-enable this test once we have a stable previously release in CI to test against (post 4.14 feature freeze and after 4.15 is branched.)

https://github.com/openshift/hypershift/pull/3189

Story CNV-30444: add upstream documentation for multiqueue settings

View the Description View the linked PRs

We need to make sure to document that multiqueue should only be used with MTU >= 9000 on the infra cluster. Smaller MTU sizes (like 1500 for example) actually displayed degraded results compared to not having multiqueue enabled at all.

https://github.com/openshift/hypershift/pull/3129

Story CNV-23418: design unsupported escape hatch mechanism custom HS/KV vms configuration

View the Description View the linked PRs

CNV QE, field engineers, and developers often need to test hypershift kubevirt in a way that isn't officially supported yet, and this often involves needing to modify the kubevirt VM's spec to enable some sort of feature, add an interface/volume, or something else along those lines.

We need to design a mechanism that works as an escape hatch to allow these sorts of unsupported modifications to be experimented with easily. This mechanism should not be a part of the official Hypershift APIs, but instead something that people can influence via an annotation or similar means.

It's likely this feature will serve as a way for us to grant temporary support exceptions to customers as well.

This can be achieved using an annotation with a json patch in it. Below is an example of how such a json patch might be placed on a NodePool to influence the VMs generated by the NodePool to have a secondary interface.

apiVersion: hypershift.openshift.io/v1beta1
kind: NodePool
metadata:
  annotations:
    hypershift.openshift.io/kubevirt-vm-jsonpatch: |-
      [
        {
          "op": "add",
          "path": "/spec/template/spec/networks",
          "value": {"name": secondary, multus: networkName: mynetwork}
        },
        {
          "op": "add",
          "path": "/spec/template/spec/domain/devices/interfaces",
          "value": {"name": secondary, bridge: {}}
        }
      ]

Feature OCPSTRAT-898: Need to ensure ICSP and IDMS interoperate especially in multi-tenant scenario

View the Description

Feature Overview (aka. Goal Summary)

In a multi-tenant, both ICSP and IDMS objects should be functional in the cluster at the same time.

Goals (aka. expected user outcomes)

Enable both ICSP and IDMS objects to exist on the cluster at the same time and roll out both configurations.

Requirements (aka. Acceptance Criteria):

ICSP and IDMS objects should be functional

Provide the ICSP to IDMS migration path without node reboot which can lead to disruption.

Epic OCPNODE-1771: allow both CRD for ICSP resources to IDMS

View the Description

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

As an ICSP user on an ongoing cluster, I want to migrate to IDMS without breaking current workloads or removing ICSP/creating IDMS objects manually.

Why is this important?

Scenarios

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story OCPNODE-1800: image-registry allow both ICSP IDMS

View the Description View the linked PRs

Do not error out if both ICSP IDMS resources exist.

https://github.com/openshift/image-registry/pull/377

Story OCPNODE-1799: MCO support both ICSP IDMS

View the Description View the linked PRs

MCO waches both ICSP, IDMS objects. As an openshift developer, I want it to process content from both kind CRD to underlying configuration.

https://github.com/openshift/machine-config-operator/pull/3898

Feature OCPSTRAT-912: Enable AWS Install for ROSA with Terraform-Free Image

View the Description

Feature Overview (aka. Goal Summary)

Goals (aka. expected user outcomes)

Remove use of Terraform in the IPI Installer from the top providers: AWS, vSphere, Metal, and Azure.

Requirements (aka. Acceptance Criteria):

The IPI Installer no longer contains or uses Terraform.

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

High-level list of items that are out of scope. Initial completion during Refinement status.

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

Interoperability Considerations

Epic CORS-2830: Provision AWS Infrastructure with SDK

View the Description View the linked PRs

Epic Goal

Provision AWS infrastructure without the use of Terraform

Why is this important?

This is a key piece in producing a terraform-free binary for ROSA. See parent epic for more details.

Scenarios

The new provider should aim to provide the same results as the existing AWS terraform provider.

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

https://github.com/openshift/installer/compare/master...enxebre:installer:no-terraform-poc?expand=1

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

https://github.com/openshift/installer/pull/7676

Epic CORS-2829: Enable non-Terraform Infra Providers

View the Description

Epic Goal

Enact a strategy to comply with license changes around Terraform and handle any CVEs that may arise (within Terraform) during the process of replacing Terraform.

Two major parts:

Get AWS off Terraform (for ROSA)
Strategy for all other platforms that stay on terraform

Why is this important?

Hashicorp will continue to backport CVE fixes to MPL versions of Terraform through the end of 2023. After that period, we will not be able to address any CVEs within Terraform through upgrades. This epic provides a strategy to use until we can remove Terraform entirely from the product.

Scenarios

AWS: due to FedRamp compliance within ROSA, will need to fix any medium CVE–exploitable or not.
Backporting (see open questions)

Acceptance Criteria

AWS platform must be able to fix all medium CVEs, regardless of whether they are exploitable
All other platforms must be able to handle CVEs based on our normal practices
...

Dependencies (internal and external)

If we decide to produce a ROSA-specific build (as expected), ROSA and Hive will need to be able to consume a separate installer binary.

Previous Work (Optional):

Open questions::

Priority of backporting CVE fixes
Can we get more concrete about standards for fixing CVEs
What would managing our own Terraform fork entail
Can we remove the alibaba provider? This would be helpful to decrease vulnerabilities.

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story CORS-2876: Create installer-altinfra image

View the Description View the linked PRs

Create an installer image to be promoted to the release payload that contains the openshift altinfra binary (produced with build tags).

https://github.com/openshift/installer/pull/7711

Story CORS-2836: Infrastructure Provider Interface: Decouple Terraform

View the Description View the linked PRs

User Story:

Encapsulate Terraform to its own package (removing dependencies from pkg/asset and cmd) so that Terraform can be included or removed based on build tags.

Interface provides a way of substituting other infrastructure providers.

Acceptance Criteria:

Description of criteria:

Investigate the difficulties associated with backporting
Point 1
Point 2
Point 3

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

(optional) https://github/com/link.to.enhancement/
(optional) https://issues.redhat.com/link.to.spike
Engineering detail 1
Engineering detail 2

This requires/does not require a design proposal.
This requires/does not require a feature gate.

https://github.com/openshift/installer/pull/7488

Story CORS-2835: Use build tags to produce openshift-install-altinfra

View the Description View the linked PRs

User Story:

I want to be able to produce an openshift-install binary for installing AWS (other platforms will not be supported) which is free of Terraform.

Acceptance Criteria:

Description of criteria:

Upstream documentation
Point 1
Point 2
Point 3

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

(optional) https://github/com/link.to.enhancement/
(optional) https://issues.redhat.com/link.to.spike
Engineering detail 1
Engineering detail 2

This requires/does not require a design proposal.
This requires/does not require a feature gate.

Story CORS-2877: Enable install config feature gate validation

View the Description View the linked PRs

The installer should support feature gate validation so that new providers can be enabled via featuregates.

https://github.com/openshift/installer/pull/7413

Epic CORS-2977: Enable AWS SDK Install by Default

View the Description

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

Switch all AWS installations to use SDK implementation rather than Terraform

Why is this important?

Currently, AWS SDK installations are only available by using the separate `installer-altinfra` image. This change would switch all users to use the SDK installation method.

Scenarios

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story CORS-2978: Enable AWS SDK install via Feature Gate

View the Description View the linked PRs

https://github.com/openshift/installer/pull/7715

https://github.com/openshift/installer/pull/7715

Feature OCPSTRAT-933: Hypershift guest cluster can use external OIDC token issuer

View the Description

Feature Overview (aka. Goal Summary)

A guest cluster can use an external OIDC token issuer. This will allow machine-to-machine authentication workflows

Goals (aka. expected user outcomes)

A guest cluster can configure OIDC providers to support the current capability: https://kubernetes.io/docs/reference/access-authn-authz/authentication/#openid-connect-tokens and the future capability: https://github.com/kubernetes/kubernetes/blob/2b5d2cf910fd376a42ba9de5e4b52a53b58f9397/staging/src/k8s.io/apiserver/pkg/apis/apiserver/types.go#L164 with an API that

allows fixing mistakes
alerts the owner of the configuration that it's likely that there is a misconfiguration (self-service)
makes distinction between product failure (expressed configuration not applied) from configuration failure (the expressed configuration was wrong), easy to determine
makes cluster recovery possible in cases where the external token issuer is permanently gone
allow (might not require) removal of the existing oauth server

Requirements (aka. Acceptance Criteria):

A list of specific needs or objectives that a feature must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

High-level list of items that are out of scope. Initial completion during Refinement status.

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

Interoperability Considerations

Epic HOSTEDCP-1246: hypershift control plane wired with external oidc

View the Description

Goal

Provide API for configuring external OIDC to management cluster components
Stop creating oauth server deployment
Stop creating oauth-apiserver
Stop registering oauth-apiserver backed apiservices
See what breaks next

Why is this important?

need API starting point for ROSA CLI and OCM
need cluster that demonstrates what breaks next for us to fix

Scenarios

Acceptance Criteria

Dev - Has a valid enhancement if necessary
CI - MUST be running successfully with tests automated
QE - covered in Polarion test plan and tests implemented
Release Technical Enablement - Must have TE slides
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions:

Done Checklist

CI - CI is running, tests are automated and merged.
Release Technical Enablement <link to Feature Enablement Presentation>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Enhancement merged: <link to meaningful PR or GitHub Issue>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story HOSTEDCP-1254: External OIDC: do not deploy oauth component when Authentication type is OIDC

View the Description View the linked PRs

User Story:

As a (user persona), I want to be able to:

Capability 1
Capability 2
Capability 3

so that I can achieve

Outcome 1
Outcome 2
Outcome 3

Acceptance Criteria:

Description of criteria:

Upstream documentation
Point 1
Point 2
Point 3

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

(optional) https://github/com/link.to.enhancement/
(optional) https://issues.redhat.com/link.to.spike
Engineering detail 1
Engineering detail 2

This requires/does not require a design proposal.
This requires/does not require a feature gate.

https://github.com/openshift/hypershift/pull/3151

Story HOSTEDCP-1253: External OIDC: bump openshift/api

View the Description View the linked PRs

We need the updated to the Authentication API to detect the Authentication Type for the cluster and deploy or not deploy the oauth components based on the set type.

https://github.com/openshift/api/pull/1614

https://github.com/openshift/hypershift/pull/3135

Story HOSTEDCP-1318: Set internal-oauth-disabled flag on openshift-apiserver with Auth type is OIDC

View the Description View the linked PRs

https://github.com/openshift/openshift-apiserver/pull/395

Bug HOSTEDCP-1336: bump openshift/api to get Authentication OAuth client fields

View the Description View the linked PRs

We need this for console with external OIDC

https://github.com/openshift/api/pull/1641

https://github.com/openshift/hypershift/pull/3282

Feature OCPSTRAT-937: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Epic SDN-4034: OpenShift North-South IPsec Implementation Enhancement and GA

View the Description View Demos

Epic Goal

Add an API extension for North-South IPsec.
close gaps from ~~SDN-3604~~ - mainly around upgrade
add telemetry

Why is this important?

without API, customers are forced to use MCO. this brings with it a set of limitations (mainly reboot per change and the fact that config is shared among each pool, can't do per node configuration)
better upgrade solution will give us the ability to support a single host based implementation
telemetry will give us more info on how widely is ipsec used.

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
Must allow for the possibility of offloading the IPsec encryption to a SmartNIC.

nmstate
k8s-nmstate
easier mechanism for cert injection (??)
telemetry

Dependencies (internal and external)

ITUP-44 - OpenShift support for North-South OVN IPSec
HATSTRAT-33 - Encrypt All Traffic to/from Cluster (aka IPSec as a Service)

Previous Work (Optional):

~~SDN-717~~ - Support IPSEC on ovn-kubernetes
~~SDN-3604~~ - Fully supported non-GA N-S IPSec implementation using machine config.

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story SDN-4162: implement upgrade solution

View the linked PRs

https://github.com/openshift/cluster-network-operator/pull/2087

Feature OCPSTRAT-941: oc (CLI) needs to be functional or fail gracefully without oauth server

View the Description

Feature Overview (aka. Goal Summary)

oc, the openshift CLI, needs as close to feature parity as we can get without the built-in oauth server and its associated user and group management. This will enable scripts, documentation, blog posts, and knowledge base articles to function across all form factors and the same form factor with different configurations.

Goals (aka. expected user outcomes)

CLI users and scripts should be usable in a consistent way regardless of the token issuer configuration.

Requirements (aka. Acceptance Criteria):

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

High-level list of items that are out of scope. Initial completion during Refinement status.

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

Interoperability Considerations

Epic WRKLDS-874: oc whoami must work without oauth-apiserver

View the Description View the linked PRs

Epic Goal*

What is our purpose in implementing this? What new capability will be available to customers?

`oc whoami` must work without the oauth-apiserver. There is an endpoint recently added to kube that allows congruent functionality.

Why is this important? (mandatory)

The oauth-apiserver does not control IdP information when external OIDC is used. this means the oauth-apiserver is no longer deployed. This causes `oc whoami` to fail.

Scenarios (mandatory)

Provide details for user scenarios including actions to be performed, platform specifications, and user personas.

Dependencies (internal and external) (mandatory)

What items must be delivered by other teams/groups to enable delivery of this epic.

Contributing Teams(and contacts) (mandatory)

Development -
Documentation -
QE -
PX -
Others -

Acceptance Criteria (optional)

Provide some (testable) examples of how we will know if we have achieved the epic goal.

Drawbacks or Risk (optional)

Done - Checklist (mandatory)

CI Testing - Basic e2e automationTests are merged and completing successfully
Documentation - Content development is complete.
QE - Test scenarios are written and executed successfully.
Technical Enablement - Slides are complete (if requested by PLM)
Engineering Stories Merged
All associated work items with the Epic are closed
Epic status should be "Release Pending"

https://github.com/openshift/oc/pull/1588

Feature OCPSTRAT-942: Console needs to be functional with external oidc token issuer

View the Description

Feature Overview (aka. Goal Summary)

When the internal oauth-server and oauth-apiserver are removed and replaced with an external OIDC issuer (like azure AD), the console must work for human users of the external OIDC issuer.

Goals (aka. expected user outcomes)

An end user can use the openshift console without a notable difference in experience. This must eventually work on both hypershift and standalone, but hypershift is the first priority if it impacts delivery

Requirements (aka. Acceptance Criteria):

User can log in and use the console
User can get a kubeconfig that functions on the CLI with matching oc
Both of those work on hypershift
both of those work on standalone.

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

High-level list of items that are out of scope. Initial completion during Refinement status.

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

Interoperability Considerations

Epic CONSOLE-3804: console operator must stop creating oauthclients when API is not present

View the Description View the linked PRs

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

When the oauthclient API is not present, the operator must stop creating the oauthclient

Why is this important?

This is preventing the operator from creating its deployment

Scenarios

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Epic CONSOLE-3805: console operator must accept clientID and secret

View the Description

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

When installed with external OIDC, the clientID and clientSecret need to be configurable to match the external (and unmanaged) OIDC server

Why is this important?

Without a configurable clientID and secret, I don't think the console can identify the user.
There must be a mechanism to do this on both hypershift and openshift, though the API may be very similar.

Scenarios

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story CONSOLE-3814: User and groups Nav items to be removed when using external oidc token

View the Description View the linked PRs

When the console is using an external oidc token the users and groups sections of the UI are no longer relevant, and we need not render them.

Acceptance criteria:

conditionally render users and groups based on existence oauth
add server flag in the console which would determine the type of ID provider

https://github.com/openshift/console/pull/13287

Task HOSTEDCP-1250: make console-operator pass OIDC client config flags

View the linked PRs

https://github.com/openshift/console-operator/pull/801

Story CONSOLE-3823: Refactor auth backend code in console

View the Description View the linked PRs

Console server code needs refactoring in order to move forward with backend changes for introducing auth against external OIDC.

AC: Move auth config to its own module

https://github.com/openshift/console/pull/13261

Feature OCPSTRAT-956: Enable Break Glass access Mechanism for Cloud Services (ROSA as MVP)

View the Description

Feature Overview (aka. Goal Summary)

Enable a "Break Glass Mechanism" in ROSA (Red Hat OpenShift Service on AWS) and other OpenShift cloud-services in the future (e.g., ARO and OSD) to provide customers with an alternative method of cluster access via short-lived certificate-based kubeconfig when the primary IDP (Identity Provider) is unavailable.

Goals (aka. expected user outcomes)

Enhance cluster reliability and operational flexibility.
Minimize downtime due to IDP unavailability or misconfiguration.
The primary personas here are OpenShift Cloud Services Admins and SREs as part of the shared responsibility.
This will be an addition to the existing ROSA IDP capabilities.

Requirements (aka. Acceptance Criteria)

Enable the generation of short-lived client certificates for emergency cluster access.
Ensure certificates are secure and conform to industry standards.
Functionality to invalidate short-lived certificates in case of an exploit.

Better UX

User Interface within OCM to facilitate the process.
SHOULD have audit capabilities.
Minimal latency when generating and using certificates (to reduce time without access to cluster).

Use Cases (Optional)

A customer's IDP is down, but they successfully use the break-glass feature to gain cluster access.
SREs use their own break-glass feature to perform critical operations on a customer's cluster.

Questions to Answer (Optional)

What is the lifetime of generated certificates? 7 days life and 1 day rotation?
What security measures are in place for certificate generation and storage?
What are the audit requirements?

Out of Scope

Replacement of primary IDP functionality.
Use of break-glass mechanism for routine operations (i.e., this is emergency/contingency mechanism)

Customer Considerations

The feature is not a replacement for the primary IDP.
Customers must understand the security implications of using short-lived certificates.

Documentation Considerations

How-to guides for using the break-glass mechanism.
FAQs addressing common concerns and troubleshooting.
Update existing ROSA IDP documentation to include this new feature.

Interoperability Considerations

Compatibility with existing ROSA, OSD (OpenShift Dedicated), and ARO (Azure Red Hat OpenShift) features.
Interoperability tests should include scenarios where both IDP and break-glass mechanism are engaged simultaneously for access.

Epic HOSTEDCP-1255: Hypershift issues cluster-admin kubeconfig usable by the customer

View the Description

Goal

Be able to provide the customer of managed OpenShift with a cluster-admin kubeconfig that allows them to access the cluster in the event that their identity provider (IdP or external OIDC) becomes unavailable or misconfigured.

Why is this important?

Managed OpenShift customers need to be able to access/repair their clusters without RH intervention (i.e. opening a ticket) in the case of an identity provider outage or misconfiguration.

Scenarios

Acceptance Criteria

Dev - Has a valid enhancement if necessary
CI - MUST be running successfully with tests automated
QE - covered in Polarion test plan and tests implemented
Release Technical Enablement - Must have TE slides
...

Dependencies (internal and external)

Coordination with OCM to make the kubeconfig available to customers upon request

Previous Work (Optional):

Open questions:

Done Checklist

CI - CI is running, tests are automated and merged.
Release Technical Enablement <link to Feature Enablement Presentation>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Enhancement merged: <link to meaningful PR or GitHub Issue>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story HOSTEDCP-1257: Create controller for approving and issuing signed customer certs

View the Description View the linked PRs

User Story:

As a (user persona), I want to be able to:

Capability 1
Capability 2
Capability 3

so that I can achieve

Outcome 1
Outcome 2
Outcome 3

Acceptance Criteria:

Description of criteria:

Upstream documentation
Point 1
Point 2
Point 3

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

(optional) https://github/com/link.to.enhancement/
(optional) https://issues.redhat.com/link.to.spike
Engineering detail 1
Engineering detail 2

This requires/does not require a design proposal.
This requires/does not require a feature gate.

https://github.com/openshift/hypershift/pull/3324

Story HOSTEDCP-1256: Create a short lived signer with rotation for signing customer certs

View the Description View the linked PRs

Create a new short-lived signer CA that signs a cluster-admin kubeconfig we provide to the customer upon request.

The CA must be trusted by the KAS and included in the CA bundle along with the CA that will sign longer lived cert-based creds like those used by SRE.

TBD: should we create the signer at the point of kubeconfig request from the customer? Or should we always have the signer active through periodic rotation?

On-demand signer:

Pro: the kubeconfig will be valid for the entire lifetime of the signer
Con: we have to rollout the KAS deployment with the new signer which adds latency to the kubeconfig request from the customer. The signer generation will be fast so we could generate the kubeconfig quickly, but it wouldn't be reliably honored by the KAS until the rollout is complete (~10-15m)

Always valid signer with rotation:

Pro: kubeconfig generation/publish is fast and near immediately honored by the KAS
Con: the kubeconfig could be valid for a very short period of time if the signer is just about to rotate when the request for the kubeconfig is made

Feature OCPSTRAT-959: Remove openshift-sdn as an install-time option for newly-installed clusters at 4.15+

View the Description

Feature Overview (aka. Goal Summary)

As part of the deprecation progression of the openshift-sdn CNI plug-in, remove it as an install-time option for new 4.15+ release clusters.

Goals (aka. expected user outcomes)

The openshift-sdn CNI plug-in is sunsetting according to the following progression:

deprecation notice delivered at 4.14 (Release Notes, What's Next presentation)
removal as an install-time option at 4.15+
removal as an option and EOL support at 4.17 GA

Requirements (aka. Acceptance Criteria):

The openshift-sdn CNI plug-in will no longer be an install-time option for newly installed 4.15+ clusters across installation options.
Customer clusters currently using openshift-sdn that upgrade to 4.15 or 4.16 with openshift-sdn will remain fully supported.
EUS customers using openshift-sdn on an earlier release (e.g. 4.12 or 4.14) will still be able to upgrade to 4.16 and receive full support of the openshift-sdn plug-in.

Questions to Answer (Optional):

Will clusters using openshift-sdn and upgrading from earlier versions to 4.15 and 4.16 still be supported?
- YES
My customer has a hard requirement for the ability to install openshift-sdn 4.15 clusters. Is there any exceptions to support that?
- Customers can file a Support Exception for consideration, and the reason for the requirement (expectation: rare) must be clarified.

Out of Scope

Background

All development effort is directed to the default primary CNI plug-in, ovn-kubernetes, which has feature parity with the older openshift-sdn CNI plug-in that has been feature frozen for the entire 4.x timeframe. In order to best serve our customers now and in the future, we are reducing our support footprint to the dominant plug-in, only.

Documentation Considerations

Product Documentation updates to reflect the install-time option change.

Epic CORS-2932: Remove openshift-sdn as an install-time option for newly-installed clusters at 4.15+

View the Description

Epic Goal

The openshift-sdn CNI plug-in is sunsetting according to the following progression:

deprecation notice delivered at 4.14 (Release Notes, What's Next presentation)
removal as an install-time option at 4.15+
removal as an option and EOL support at 4.17 GA

Why is this important?

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
The openshift-sdn CNI plug-in will no longer be an install-time option for newly installed 4.15+ clusters across installation options.
Customer clusters currently using openshift-sdn that upgrade to 4.15 or 4.16 with openshift-sdn will remain fully supported.
EUS customers using openshift-sdn on an earlier release (e.g. 4.12 or 4.14) will still be able to upgrade to 4.16 and receive full support of the openshift-sdn plug-in.

Open questions::

Will clusters using openshift-sdn and upgrading from earlier versions to 4.15 and 4.16 still be supported?
- YES
My customer has a hard requirement for the ability to install openshift-sdn 4.15 clusters. Is there any exceptions to support that?
- Customers can file a Support Exception for consideration, and the reason for the requirement (expectation: rare) must be clarified.

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story CORS-2950: Remove openshift-sdn as an install-time option at 4.15+

View the Description View the linked PRs

User Story:

As a (user persona), I want to be able to:

Have OpenShiftSDN (openshift-sdn CNI plug-in) no longer be an option for networkType, making the only supported value for the network OVNKubernetes

so that I can achieve

The removal of the openshift-sdn CNI plug-in at install-time for 4.15+

Acceptance Criteria:

Description of criteria:

Upstream documentation
Point 1
Point 2
Point 3

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

(optional)[ https://github.com/openshift/enhancements/blob/master/enhancements/network/sdn-live-migration.md#rollback|https://github.com/openshift/enhancements/blob/master/enhancements/network/sdn-live-migration.md#rollback]
https://github.com/openshift/installer/blob/f60ebb065b4242586f7afacc5f2be8afddbdfbde/pkg/types/validation/installconfig.go#L333C1-L333C85

This requires/does not require a design proposal.
This requires/does not require a feature gate.

https://github.com/openshift/installer/pull/7720

Feature OCPSTRAT-967: Improve Console UI experience for Software Supply Chain Security (SSCS) Use Cases

View the Description

Outcome/Feature Overview (aka. Goal Summary)

Enrich the OpenShift Pipelines experience for DevSecOps and Software Supply Chain Security use cases such as CVEs, SBOMs and signatures.

Goals (aka. expected user outcomes)

Improving application developer experience when using OpenShift Pipelines by increasing awareness of important SSCS elements. An OpenShift Pipelines PipelineRun's Task can emit CVEs, SBOMs, policy reporting as well as identify signing status.

Requirements (aka. Acceptance Criteria):

CVE Summary as a column in PipelineRun list view
Link to SBOM for PipelineRuns
Add a badge/icon for chains.tekton.dev/signed=true for PipelineRuns

Epic ODC-7420: Improve Console UI experience for Software Supply Chain Security (SSCS) Use Cases

View the Description

Problem:

Enrich the OpenShift Pipelines experience for DevSecOps and Software Supply Chain Security use cases such as CVEs, SBOMs and signatures.

Goal:

Acceptance criteria:

CVE Summary as a column in PipelineRun list view
Link to SBOM for PipelineRuns
Add a badge/icon for chains.tekton.dev/signed=true for PipelineRuns
Move pipelinerun results section to output tab using static tab extension point.

Dependencies (External/Internal):

Design Artifacts:

Miro link - here

Exploration:

Project GUI enhancements doc

Story ODC-7423: Move pipelinerun results section to output tab

View the Description View the linked PRs

Description

As a user, I would like to see all the pipelinerun results in a new Output tab.

Acceptance Criteria

Move the Pipelinerun results section to the. new Output tab.
Use extension point to add the output tab, so that it can be overwritten later via dynamic plugin.

Additional Details:

Slack thread - https://redhat-internal.slack.com/archives/C060FCC5KU1/p1699442229040389?thread_ts=1699441759.578729&cid=C060FCC5KU1

https://github.com/openshift/console/pull/13323

Story ODC-7421: Show vulnerability column in the pipelinerun list page

View the Description View the linked PRs

Description

As a user, I want to see the vulnerabilities in the OCP console, so that I can identify and fix the issue as early as possible.

Acceptance Criteria

Show the Vulnerabilities column in the pipelinerun list page.
UI should use the new tekton results naming conventions (find the link below).
UI needs to aggregate all the results that contain the string SCAN_OUTPUT. eg: ROXCTL_SCAN_OUTPUT, ACS_SCAN_OUTPUT.
Show signed badge next to the pipelinerun name if it is signed by chains.
Show View SBOM link in the kebab menu, if the pipeline run has SBOM attached to it.

Additional Details:

Tekton results naming conventions - doc

Batch the tekton results API request to avoid performance issues and use pagination to fetch the vulnerabilities when a user scrolls down in the list page.

Note: A pipelinerun can have multiple results SCAN_OUTPUT results.

https://github.com/openshift/console/pull/13329

Story ODC-7422: Show SBOM link and Signed badge in the pipelinerun details page

View the Description View the linked PRs

Description

As a user, I want to see the SBOM link in the pipelinerun details page and if the pipelinerun is signed by chains then a signed badge should appear next to the pipelinerun name.

Acceptance Criteria

If a pipelinerun has produced an SBOM, then show the SBOM section in the pipelinerun details page.
Pipelineruns that are signed by chains should be indicated by a signed badge.

Additional Details:

Tekton results annotation to be used - https://docs.google.com/document/d/1_1YXFx0ymzjl4b9M_LDjmmGrEYey5mDrfTdNn_56hpM/edit#heading=h.u0j4yw1zdczm

Miro link

https://github.com/openshift/console/pull/13314

Story ODC-7424: Add e2e tests for Dance features

View the Description View the linked PRs

Acceptance Criteria

Add e2e tests for the dance features.

Feature OCPSTRAT-969: Resolve Self-managed HCP Post-GA Tech-debt

View the Description

Address technical debt around self-managed HCP deployments, including but not limited to

CA ConfigMaps into the trusted bundle for both the CPO and Ignition Server, improving trust and security.
Create dual stack clusters through CLI with or without default values, ensuring flexibility and user preference in network management.
Utilize CLI commands to disable default sources, enhancing customizability.
Benefit from less intrusive remote write failure modes,.
...

Epic HOSTEDCP-1269: Post-GA Self-Managed Tech-Debt

View the Description

Goal

Address all the tasks we didn't finish for the GA

Collect and track all missing topics for self-managed and agent provider

Story HOSTEDCP-999: Create a periodic CI job for disconnected hypershift install with the agent CAPI provider

View the Description View the linked PRs

This can be based on the exising CAPI agent provider workflow which already has an env var flag for disconnected

ref: https://github.com/openshift/release/blob/a1fb73c2e59df3d74eb35aca739bde7d6716963d/ci-operator/step-registry/assisted/baremetal/operator/capi/assisted-baremetal-operator-capi-ref.yaml#L12

Bug OCPBUGS-18602: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2950

Bug OCPBUGS-18460: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2950

Story HOSTEDCP-1278: Enhance hcp cli to support creating dual stack cluster

View the Description View the linked PRs

User Story:

As a (user persona), I want to be able to:

HCP CLI can create dual stack cluster with CIDRs

so that I can achieve

We can directly use HCP CLI to create dual stack cluster.

Acceptance Criteria:

Description of criteria:

Can directly use HCP CLI to create dual stack cluster.

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

(optional) https://github/com/link.to.enhancement/
(optional) https://issues.redhat.com/link.to.spike
Engineering detail 1
Engineering detail 2

This requires/does not require a design proposal.
This requires/does not require a feature gate.

https://github.com/openshift/hypershift/pull/3161

Bug OCPBUGS-18128: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2950

Story HOSTEDCP-1322: Default the UpgradeType for Agent provider to InPlace

View the Description View the linked PRs

By default Agent provider is creating clusters delegating on the CLI, this is not bad, but if you don't define the UpgradeType as a CLI argument it will default to Replace which is basically the focus for cloud providers. We need to default the UpgradeType for Agent provider to InPlace but also respect the option set from the CLI. We also need to check with Kubevirt team what is the desired default.

https://github.com/openshift/hypershift/pull/3273

Feature OCPSTRAT-974: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Epic OCPNODE-1886: [OCP Rebase] Rebase OCP control plane with Kubernetes v1.29

View the Description View the linked PRs

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

Goal of this epic is to capture all the amount of required work and efforts that take to update the openshift control plane with the upstream kubernetes v1.29

Why is this important?

Rebase is a must process for every ocp release to leverage all the new features implemented upstream

Scenarios

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Following epic captured the previous rebase work of k8s v1.28
https://issues.redhat.com/browse/STOR-1425

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

https://github.com/openshift/cluster-kube-scheduler-operator/pull/508

Story OCPNODE-1892: Rebase openshift/kubernetes with upstream kubernetes 1.29

View the Description View the linked PRs

Follow the rebase doc[1] and update the spreadsheet[2] that tracks the required commits to be cherry-picked. Rebase the o/k repo with the "merge=ours" strategy as mentioned in the rebase doc.
Save the last commit id in the spreadsheet for future references.

Update the rebase doc if required.

[1] https://github.com/openshift/kubernetes/blob/master/REBASE.openshift.md
[2] https://docs.google.com/spreadsheets/d/10KYptJkDB1z8_RYCQVBYDjdTlRfyoXILMa0Fg8tnNlY/edit#gid=1957024452

Prev. Ref:
https://github.com/openshift/kubernetes/pull/1646

Feature OCPSTRAT-98: Updated boot images: Phase 1 - GCP Tech Preview

View the Description

Feature Overview

OCP 4 clusters still maintain pinned boot images. We have numerous clusters installed that have boot media pinned to first boot images as early as 4.1. In the future these boot images may not be certified by the OEM and may fail to boot on updated datacenter or cloud hardware platforms. These "pinned" boot images should be updateable so that customers can avoid this problem and better still scale out nodes with boot media that matches the running cluster version.

Phase one: GCP tech preview

Epic MCO-589: Update boot images for GCP (tech preview)

View the Description

[stub]

See ongoing exploration here: https://docs.google.com/document/d/1GBhrBlOddG_ktIEw2alVyDZrBWqHoZD0M1ZjkiIjpIQ/edit#heading=h.vi1faxuezcs1

Story MCO-679: Add updated boot images support for GCP

View the Description View the linked PRs

The bootimage references are currently saved off in the machineset by the openshift installer and is thereafter unmanaged. This machineset object is not updated on an upgrade, so any node scaled up using it will boot up with the original “install” bootimage.

The “new” boot image references are available in a configmap/coreos-bootimages in the MCO namespace. Here is the PR that implemented this, it’s basically a CVO manifest that pulls from this file in the installer binary. Hence, they are updated on an upgrade. It can also be printed out to console by the following command on the installer: /openshift-install coreos print-stream-json.

Implementing this portion should be as simple as iterating through each machineset, and updating the new disk image by crossreferencing the configmap, architecture, region and the platform used in the machineset. This is where the installer figures out the bootimage during an install, so we could model a bit after this.

It looks like we have Machine API objects for every platform specific providerSpec(formally called providerConfig) we support here. We'd still have to special case the image/ami actual portion of this, but we should be able to leverage some of the work done in the installer(to generate machinesets, for example, GCP) to understand how the image reference is stored for every platform.

Done when:

For MVP, the goal is to

add a new sub controller within the MCC. This subcontroller can be triggered by a listener on the machinesets and if any changes happen to the "golden" configmap mentioned above
We'll support GCP to start. I'll make a follow-up card for the other platforms, but I'm open to adding more here if needed!

https://github.com/openshift/machine-config-operator/pull/4083

Feature OCPSTRAT-980: Enforce Data/Secret Encryption for the Control-Planes, Etcd, and Nodes

View the Description

Feature Overview (Goal Summary)

This feature is dedicated to enhancing data security and implementing encryption best practices across control-planes, Etcd, and nodes for HyperShift with Azure. The objective is to ensure that all sensitive data, including secrets is encrypted, thereby safeguarding against unauthorized access and ensuring compliance with data protection regulations.

Epic HOSTEDCP-1234: Secret encryption

View the Description View the linked PRs

User Story:

As a service provider/consumer I want to make sure secrets are encrypted with key owned by the consumer

Acceptance Criteria:

Expose and propagate input for kms secret encryption similar to what we do in AWS.

https://github.com/openshift/hypershift/blob/90aa44d064f6fe476ba4a3f25973768cbdf05eb5/api/v1beta1/hostedcluster_types.go#L1765-L1790

See related discussion:

https://redhat-internal.slack.com/archives/CCV9YF9PD/p1696950850685729

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

(optional) https://github/com/link.to.enhancement/
(optional) https://issues.redhat.com/link.to.spike
Engineering detail 1
Engineering detail 2

This requires/does not require a design proposal.
This requires/does not require a feature gate.

https://github.com/openshift/hypershift/pull/3183

Feature OCPSTRAT-981: Implement Lifecycle & Image Management for NodePools

View the Description

Feature Overview (Goal Summary)

This feature focuses on the optimization of resource allocation and image management within NodePools. This will include enabling users to specify resource groups at NodePool creation, integrating external DNS support, ensuring Cluster API (CAPI) and other images are sourced from the payload, and utilizing Image Galleries for Azure VM creation.

Epic HOSTEDCP-1237: Ensure CAPI and any other image comes from payload

View the Description View the linked PRs

User Story:

As Hypershift consumer I want to make sure images are versioned with the control plane

Acceptance Criteria:

~~Fetch CAPI images from payload, currently are pinned.~~ ~~HOSTEDCP-1227~~
Review there's no more images pinned

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

The RHCOS image needs pulled from the release image vs being hard coded here.

This requires/does not require a design proposal.
This requires/does not require a feature gate.

https://github.com/openshift/hypershift/pull/3177

Epic HOSTEDCP-1227: Get CAPZ image from the OCP release image

View the Description View the linked PRs

User Story:

As a user of HyperShift, I want the cluster API Azure (CAPZ) image to come from the OCP release image rather than being hardcoded in the HyperShift code so that I can always use the latest CAPZ image related to the OCP release image.

Acceptance Criteria:

The CAPZ image comes from the OCP release image.

Out of Scope:

N/A

Engineering Details:

The CAPZ image only comes from the OCP release image if:
- the CAPZ image is not set through an environment variable
- the CAPZ image is not set through an annotation on the HostedCluster CR

This requires/does not require a design proposal.
This requires/does not require a feature gate.

https://github.com/openshift/hypershift/pull/3074

Epic HOSTEDCP-1229: Move azure cloud provider to out of tree

View the Description View the linked PRs

User Story:

As a user of HyperShift on Azure, I would like the reconciliation process for the cloud controller manager to run the Azure external provider like we do for AWS and other platforms so that the NodePool nodes will join the HostedCluster.

Acceptance Criteria:

The cloud controller manager reconciliation process takes the Azure external provider into account.
The NodePool nodes join the HostedCluster.

Out of Scope:

Any other issues with Azure HostedClusters discovered during development.

Engineering Details:

Cloud Provider Azure Docs - https://cloud-provider-azure.sigs.k8s.io/
Cloud Provider Azure Installation Configuration Docs - https://cloud-provider-azure.sigs.k8s.io/install/configs/
Cloud Provider Azure GitHub Repo - https://github.com/kubernetes-sigs/cloud-provider-azure

This requires/does not require a design proposal.
This requires/does not require a feature gate.

https://github.com/openshift/hypershift/pull/3086

Feature OCPSTRAT-982: Network Optimization and Management Enhancements

View the Description

Feature Overview (aka. Goal Summary)

The feature is specifically designed to concentrate on network optimizations, particularly targeting improvements in how network is configured and how access is managed using Cluster API (CAPI) for Azure (Potentially running the control-plane on AKS).

Epic HOSTEDCP-1236: Enable external DNS support

View the Description View the linked PRs

User Story:

As a cluster service provider / consumer I want the hosted control plane endpoints to be resolvable through a known dns zone

Acceptance Criteria:

External DNS support works for Azure Hosted Clusters as in AWS, exposing the endpoints when the hostname is set in the input https://github.com/openshift/hypershift/blob/main/api/v1beta1/hostedcluster_types.go#L553-L556

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

(optional) https://github/com/link.to.enhancement/
(optional) https://issues.redhat.com/link.to.spike
Engineering detail 1
Engineering detail 2

This requires/does not require a design proposal.
This requires/does not require a feature gate.

https://github.com/openshift/hypershift/pull/3233

Feature OCPSTRAT-990: [GA] Allow customer managed DNS solutions for GCP: Implementation

View the Description

Goal:

As an administrator, I would like to use my own managed DNS solution instead of only specific openshift-install supported DNS services (such as AWS Route53, Google Cloud DNS, etc...) for my OpenShift deployment.

Problem:

While cloud-based DNS services provide convenient hostname management, there's a number of regulatory (ITAR) and operational constraints customers face prohibiting the use of those DNS hosting services on public cloud providers.

Why is this important:

Provides customers with the flexibility to leverage their own custom managed ingress DNS solutions already in use within their organizations.
Required for regions like AWS GovCloud in which many customers may not be able to use the Route53 service (only for commercial customers) for both internal or ingress DNS.
OpenShift managed internal DNS solution ensures cluster operation and nothing breaks during updates.

Dependencies (internal and external):

DNS work for KNI
https://docs.google.com/document/d/1VsukDGafynKJoQV8Au-dvtmCfTjPd3X9Dn7zltPs8Cc/edit

This is a prerequisite for the internal clusters epic: https://docs.google.com/document/d/1gxtIW6OlasVQtQLTyOl6f9H9CMuxiDNM5hQFNd3xubE/edit#

Prioritized epics + deliverables (in scope / not in scope):

Ability to bootstrap cluster without an OpenShift managed internal DNS service running yet
Scalable, cluster (internal) DNS solution that's not dependent on the operation of the control plane (in case it goes down)
Ability to automatically propagate DNS record updates to all nodes running the DNS service within the cluster
Option for connecting cluster to customers ingress DNS solution already in place within their organization

Estimate (XS, S, M, L, XL, XXL):

Previous Work:

Open questions:

Link to Epic: https://docs.google.com/document/d/1OBrfC4x81PHhpPrC5SEjixzg4eBnnxCZDr-5h3yF2QI/edit?usp=sharing

Epic CORS-2949: Add in-cluster Ingress *.apps DNS resolution

View the Description

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

At this point in the feature, we would have a working in-cluster CoreDNS pod capable of resolving API and API-Int URLs.

This Epic details that work required to augment this CoreDNS pod to also resolve the *.apps URL. In addition, it will include changes to prevent Ingress Operator from configuring the cloud DNS after the ingress LBs have been created.

Why is this important?

Scenarios

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story CORS-2315: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7846

Epic CORS-2465: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story CORS-2818: Add ability to render Cloud LB IPs for API, API-Int and Ingress

View the Description View the linked PRs

User Story:

As a (user persona), I want to be able to:

Capability 1
Capability 2
Capability 3

so that I can achieve

Outcome 1
Outcome 2
Outcome 3

Acceptance Criteria:

Description of criteria:

Upstream documentation
Point 1
Point 2
Point 3

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

(optional) https://github.com/openshift/enhancements/pull/1468
(optional) https://issues.redhat.com/link.to.spike
https://github.com/openshift/baremetal-runtimecfg is responsible for generating the Corefile for CoreDNS. Supply the API and API-Int LB IP addresses to baremetal-runtimecfg using new optional flags: "cloud-ext-lb-ips" and "cloud-int-lb-ips".
When these input parameters are present, generate the CoreDNS Corefile with entries for api and api-int urls using these LB IP addresses.

This requires/does not require a design proposal.
This requires/does not require a feature gate.

https://github.com/openshift/baremetal-runtimecfg/pull/286

Story CORS-2798: User Managed DNS: Create Load Balancer Config

View the Description View the linked PRs

User Story:

As a (user persona), I want to be able to:

Create a Writeable asset or Asset to represent the config map for load balancer information
No current asset dependencies
Create a function `CreateLBConfigMap` that accepts the name, internal lb data, public lb data, and platform name. The function will return the string representation of the config map

so that I can achieve

A Config Map that will be loaded onto cluster nodes during installation for custom dns solutions.

Acceptance Criteria:

Description of criteria:

Upstream documentation
Point 1
Point 2
Point 3

(optional) Out of Scope:

If a zone has not been granted permission to be shared across projects (if in different projects), then the install will fail.

https://github.com/openshift/installer/pull/7631

Epic CORS-2460: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story CORS-2813: Pass LB ConfigMap as parameter to MCO during bootstrap

View the Description View the linked PRs

User Story:

As a (user persona), I want to be able to:

depend on OpenShift to self-host its DNS for api, api-int and ingress during the bootstrap process
Capability 2
Capability 3

so that I can achieve

Outcome 1
Outcome 2
Outcome 3

Acceptance Criteria:

Description of criteria:

Upstream documentation
Point 1
Point 2
Point 3

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

(optional) https://github.com/openshift/enhancements/pull/1468
(optional) https://issues.redhat.com/link.to.spike
Engineering detail 1
Engineering detail 2

This requires/does not require a design proposal.
This requires/does not require a feature gate.

https://github.com/openshift/installer/pull/7662

Story CORS-2952: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7796

Story CORS-3029: Add Cloud LB IPs to Platform Status of Infra CR

View the Description View the linked PRs

User Story:

As a (user persona), I want to be able to:

Capability 1
Capability 2
Capability 3

so that I can achieve

Outcome 1
Outcome 2
Outcome 3

Acceptance Criteria:

Description of criteria:

Upstream documentation
Point 1
Point 2
Point 3

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

(optional) https://github/com/link.to.enhancement/
(optional) https://issues.redhat.com/link.to.spike
Engineering detail 1
Engineering detail 2

This requires/does not require a design proposal.
This requires/does not require a feature gate.

https://github.com/openshift/installer/pull/7837

Spike CORS-3190: Custom DNS: Add Infra CR to bootstrap ignition

View the Description View the linked PRs

Append Infra CR with only the GCP PlatformStatus field (without any other fields esp the Spec) set with the LB IPs at the end of the bootstrap ignition. The theory is that when Infra CR is applied from the bootstrap ignition, first the infra manifest is applied. As we progress through all the other assets in the ignition files, Infra CR appears again but with only the LB IPs set. That way it will update the existing Infra CR already applied to the cluster.

https://github.com/openshift/installer/pull/7888

Story CORS-2798: User Managed DNS: Create Load Balancer Config

View the Description View the linked PRs

User Story:

As a (user persona), I want to be able to:

Create a Writeable asset or Asset to represent the config map for load balancer information
No current asset dependencies
Create a function `CreateLBConfigMap` that accepts the name, internal lb data, public lb data, and platform name. The function will return the string representation of the config map

so that I can achieve

A Config Map that will be loaded onto cluster nodes during installation for custom dns solutions.

Acceptance Criteria:

Description of criteria:

Upstream documentation
Point 1
Point 2
Point 3

(optional) Out of Scope:

If a zone has not been granted permission to be shared across projects (if in different projects), then the install will fail.

https://github.com/openshift/installer/pull/7631

Feature OCPSTRAT-993: Use Bound Service account tokens when generating image pull secrets

View the Description

Feature Overview (aka. Goal Summary)

Stop generating long-lived service account tokens. Long-lived service account tokens are currently generated in order to then create an image pull secret for the internal image registry. This feature calls for using the TokenRequest API to generate a bound service account token for use in the image pull secret.

Goals (aka. expected user outcomes)

Use TokenRequest API to create image pull secrets.
{}Performance benefits:

One less secret created per service account. This will result in at least three less secrets generated per namespace.

Security benefits:

Long lived tokens which are no longer recommended as they present a possible security risk.

Requirements (aka. Acceptance Criteria):

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

High-level list of items that are out of scope. Initial completion during Refinement status.

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

Interoperability Considerations

Epic API-1644: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Feature TELCOSTRAT-160: Expeditious SNO Upgrade and Rollback To Meet Telco Far Edge KPIs

View the Description

Feature Overview

Telecommunications providers look to displace Physical Network Functions (PNFs) with modern Virtual Network Functions (VNFs) at the Far Edge. Single Node OpenShift, as a the CaaS layer in the vRAN vDU architecture, must achieve a higher standard in regards to OpenShift upgrade speed and efficiency, as in comparison to PNFs.

Telecommunications providers currently deploy Firmware-based Physical Network Functions (PNFs) in their RAN solutions. These PNFs can be upgraded quickly due to their monolithic nature and image-based download-and-reboot upgrades. Furthermore they often have the ability to retry upgrades and to rollback to the previous image if the new image fails. These Telcos are looking to displace PNFs with virtual solutions, but will not do so unless the virtual solutions have comparable operational KPIs to the PNFs.

Goals

Service Downtime

Service (vDU) Downtime is the time when the CNF is not operational and therefore no traffic is passing through the vDU. This has a significant impact as it degrades the customer’s service (5G->4G) or there’s an outright service outage. These disruptions are scheduled into Maintenance Windows (MW), but the Telecommunications Operators primary goal is to keep service running, so getting vRAN solutions with OpenShift to near PNF-like Service Downtime is and always will be a primary requirement.

Upgrade Duration

Upgrading OpenShift is only one of many operations that occur during a Maintenance Window. Reducing the CaaS upgrade duration is meaningful to many teams within a Telecommunications Operators organization as this duration fits into a larger set of activities that put pressure on the duration time for Red Hat software. OpenShift must reduce the upgrade duration time significantly to compete with existing PNF solutions.

Failure Detection and Remediation

As mentioned above, the Service Downtime disruption duration must be as small as possible, this includes when there are failures. Hardware failures fall into a category called Break+Fix and are covered by TELCOSTRAT-165. In the case of software failures must be detected and remediation must occur.

Detection includes monitoring the upgrade for stalls and failures and remediation would require the ability to rollback to the previously well-known-working version, prior to the failed upgrade.

Implicit Requirements

Upgrade To Any Release

The OpenShift product support terms are too short for Telco use cases, in particular vRAN deployments. The risk of Service Downtime drives Telecommunications Operators to a certify-deploy-and-then-don’t-touch model. One specific request from our largest Telco Edge customer is for 4 years of support.

These longer support needs drive a misalignment with the EUS->EUS upgrade path and drive the requirement that the Single Node OpenShift deployment can be upgraded from OCP X.y.z to any future [X+1].[y+1].[z+1] where X+1 and x+1 are decided by the Telecommunications Operator depending on timing and the desired feature-set and x+1 is determined through Red Hat, vDU vendor and custom maintenance and engineering validation.

Alignment with Related Break+Fix and Installation Requirements

Red Hat is challenged with improving multiple OpenShift Operational KPIs by our telecommunications partners and customers. Improved Break+Fix is tracked in TELCOSTRAT-165 and improved Installation is tracked in TELCOSTRAT-38.

Seamless Management within RHACM

Whatever methodology achieves the above requirements must ensure that the customer has a pleasant experience via RHACM and Red Hat GitOps. Red Hat’s current install and upgrade methodology is via RHACM and any new technologies used to improve Operational KPIs must retain the seamless experience from the cluster management solution. For example, after a cluster is upgraded it must look the same to a RHACM Operator.

Seamless Management when On-Node Troubleshooting

Whatever methodology achieves the above requirements must ensure that a technician troubleshooting a Single Node OpenShift deployment has a pleasant experience. All commands issued on the node must return output as it would before performing an upgrade.

Requirements

Y-Stream

CaaS Upgrade must complete in <2 hrs (single MW) (20 mins for PNF)
Minimize service disruption (customer impact) < 30 mins [cumulative] (5 mins for PNF)
Support In-Place Upgrade without vDU redeployment
Support backup/restore and rollback to known working state in case of unexpected upgrade failures

Z-Stream

CaaS Patch Release (z-release) Upgrade Requirements:
CaaS Upgrade must complete <15 mins
Minimize service disruption (customer impact) < 5 mins
Support In-Place Upgrade without vDU redeployment
Support backup/restore and rollback to known working state in case

Rollback

Restore and Rollback functionality must complete in 30 minutes or less.

Upgrade Path

Allow for upgrades to any future supported OCP release.

References

Epic MGMT-15783: Cluster Reconfiguration

View the Description

Feature goal (what are we trying to solve here?)

A systemd service that runs on a golden image first boot and configure the following:

1. networking ( the internal IP address require special attention)

2. Update the hostname (~~MGMT-15775~~)

3. Execute recert (regenereate certs, Cluster name and base domain ~~MGMT-15533~~)

4. Start kubelet

5. Apply the personalization info:

Pull Secret
Proxy
ICSP
DNS server
SSH keys

DoD (Definition of Done)

The API for configuring the networking, hostname, personalization manifests and executing recert is well defined
The service configures the required attributes and start a functional OCP cluster
CI job that reconfigures a golden image and form a functional cluster

Does it need documentation support?

If the answer is "yes", please make sure to check the corresponding option.

Feature origin (who asked for this feature?)

- Internal request

- - This is required as part of https://issues.redhat.com/browse/OCPSTRAT-620

- Catching up with OpenShift
  
  Reasoning (why it’s important?)

The following features depend on this functionality:

Competitor analysis reference

Do our competitors have this feature?
- Yes, they have it and we can have some reference
- No, it's unique or explicit to our product
- No idea. Need to check

Feature usage (do we have numbers/data?)

We have no data - the feature doesn’t exist anywhere
Related data - the feature doesn’t exist but we have info about the usage of associated features that can help us
- Please list all related data usage information
We have the numbers and can relate to them
- Please list all related data usage information

Feature availability (why should/shouldn't it live inside the UI/API?)

Please describe the reasoning behind why it should/shouldn't live inside the UI/API
If it's for a specific customer we should consider using AMS
Does this feature exist in the UI of other installers?

Task MGMT-16061: SNO: dnsmasq and force dns should be configurable in order to support ip change

View the Description View the linked PRs

As we want to support IBU with single ip we should change dnsmasq and force dns configurations for sno in order to support ip change

Feature TELCOSTRAT-18: CPU Manager: mix of exclusive and shared CPUs for a container

View the Description

Problem statement

DPDK applications require dedicated CPUs, and isolated any preemption (other processes, kernel threads, interrupts), and this can be achieved with the “static” policy of the CPU manager: the container resources need to include an integer number of CPUs of equal value in “limits” and “request”. For instance, to get six exclusive CPUs:

spec:

  containers:

  - name: CNF

    image: myCNF

    resources:

      limits:

        cpu: "6"

      requests:

        cpu: "6"

The six CPUs are dedicated to that container, however non trivial, meaning real DPDK applications do not use all of those CPUs as there is always at least one of the CPU running a slow-path, processing configuration, printing logs (among DPDK coding rules: no syscall in PMD threads, or you are in trouble). Even the DPDK PMD drivers and core libraries include pthreads which are intended to sleep, they are infrastructure pthreads processing link change interrupts for instance.

Can we envision going with two processes, one with isolated cores, one with the slow-path ones, so we can have two containers? Unfortunately no: going in a multi-process design, where only dedicated pthreads would run on a process is not an option as DPDK multi-process is going deprecated upstream and has never picked up as it never properly worked. Fixing it and changing DPDK architecture to systematically have two processes is absolutely not possible within a year, and would require all DPDK applications to be re-written. Knowing that the first and current multi-process implementation is a failure, nothing guarantees that a second one would be successful.

The slow-path CPUs are only consuming a fraction of a real CPU and can safely be run on the “shared” CPU pool of the CPU Manager, however containers specifications do not accept to request two kinds of CPUs, for instance:

spec:

  containers:

  - name: CNF

    image: myCNF

    resources:

      limits:

        cpu_dedicated: "4"

        cpu_shared: "20m"

      requests:

        cpu_dedicated: "4"

        cpu_shared: "20m"

Why do we care about allocating one extra CPU per container?

Allocating one extra CPU means allocating an additional physical core, as the CPUs running DPDK application should run on a dedicated physical core, in order to get maximum and deterministic performances, as caches and CPU units are shared between the two hyperthreads.

CNFs are built with a minimum of CPUs per container. This is still between 10 and 20, sometime more, today, but the intent is to decrease this number of CPU and increase the number of containers as this is the “cloud native” way to waste resources by having too large containers to schedule, like in the VNF days (tetris effect)

Let’s take a realistic example, based on a real RAN CNF: running 6 containers with dedicated CPUs on a worker node, with a slow Path requiring 0.1 CPUs means that we waste 5 CPUs, meaning 3 physical cores. With real life numbers:

For a single datacenter composed of 100 nodes, we waste 300 physical cores
For a single datacenter composed of 500 nodes, we waste 1500 physical cores
For a single node OpenShift deployed on 1 Millions of nodes, we waste 3 Millions of physical cores

Intel public CPU price per core is around 150 US$, not even taking into account the ecological aspect of the waste of (rare) materials and the electricity and cooling…

Goals

Implement an equivalent of Nokia CPU pooler, meaning a way to allocate dedicated and shared CPUs to a given container, and provide a way within the container to know which CPUs belong to which pool, so the CNF running in the container can properly pin its pthreads on the available CPUs.

Requirements

This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

Questions to answer…

Would an implementation based on annotations be possible rather than an implementation requiring a container (so pod) definition change, like the CPU pooler does?

Out of Scope

Background, and strategic fit

This issue has been addressed lately by OpenStack.

Assumptions

Customer Considerations

Documentation Considerations

The feature needs documentation on how to configure OCP, create pods, and troubleshoot

Epic CNF-9117: Mixed-CPUs for container workloads implementation

View the Description

Epic Goal

An NRI plugin that invoked by CRI-O right before the container creation, and updates the container's cpuset and quota to match the mixed-cpus request.
The cpu pinning reconciliation operation must also execute the NRI API call on every update (so we can intercept kubelet and it does not destroy our changes)
Dev Preview for 4.15

Why is this important?

This would unblock lots of options including mixed cpu workloads where some CPUs could be shared among containers / pods ~~CNF-3706~~
This would also allow further research on dynamic (simulated) hyper threading ~~CNF-3743~~

Scenarios

Acceptance Criteria

Have an NRI plugin which called by the runtime and updates the container with mutual cpus.
The plugin must be able to override CPU manager conciliation loop and immune to future CPU manager changes.
The plugin must be robust and handle node reboot/kubelet/crio restart scenarios
upstream CI - MUST be running successfully with tests automated.
Release Technical Enablement - Provide necessary release enablement details and documents.
OCP adoption in relevant OCP version
NTO shall be able to deploy the new plugin

Dependencies (internal and external)

Previous Work (Optional):

https://issues.redhat.com/browse/CNF-3706 : Spike - mix of shared and pinned/dedicated cpus within a container
https://issues.redhat.com/browse/CNF-3743 : Spike: Dynamic offlining of cpu siblings to simulate no-smt
upstream Node Resource Interface project - https://github.com/containerd/nri
https://issues.redhat.com/browse/CNF-6082: [SPIKE] Cpus assigned hook point in CRI-O
https://issues.redhat.com/browse/CNF-7603

Open questions::

N/A

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story CNF-8809: kubernetes: node admission plugin for shared cpus

View the Description View the linked PRs

We need to extend the node admission plugin to support the shared cpus.

The admission should provide the following functionalities:
1. In case a user specifies more than a single `openshift.io/enabled-shared-cpus` resource, it rejects the pod request with an error explaining the user how to fix its pod spec.
2. It adds an annotation `cpu-shared.crio.io` that will be used to tell the runtime that shared cpus were requested.
For every container requested for shared cpus, it adds an annotation with the following scheme:
`cpu-shared.crio.io/<container name>`

Example of how it's done for core pinning: https://github.com/openshift/kubernetes/commit/04ff5090bae1cb181a2464696adde8709cdd0a93

https://github.com/openshift/kubernetes/pull/1799

Story CNF-8326: kubelet: advertise shared cpu as extended resources

View the Description View the linked PRs

We need to add support to Kubelet to advertise the shared-cpu as `openshift.io/enabled-shared-cpus` through extended resources

This should be off by default and only activated when a configuration file is being supplied.

https://github.com/openshift/kubernetes/pull/1795

Story CNF-10479: cluster-config: bump deps

View the Description View the linked PRs

bump cluster-config-operator to pull mixed-cpus feature-gate api

https://github.com/openshift/cluster-config-operator/pull/386

Story CNF-10473: nto: add feature gates support

View the Description View the linked PRs

In order to protect the operator (and the cluster in general) from
Tech preview (TP) features, we should add feature gates support under NTO.

https://github.com/openshift/cluster-node-tuning-operator/pull/858

Story CNF-7610: NTO/PAO: enable the mixed cpus feature

View the Description View the linked PRs

The feature enablement is done through the performance profile.

We should follow what described in the EP ( https://github.com/openshift/enhancements/pull/1396) and add all the bits and bytes that are needed in NTO for the feature activation.

https://github.com/openshift/cluster-node-tuning-operator/pull/853

Feature TELCOSTRAT-58: Ensure Telco Far Edge is "Ready" when OCP GAs

View the Description

Feature Overview

To give Telco Far Edge customers as much of the product support lifespan as possible, we need to ensure that OCP releases are "telco ready" when the OCP release is GA.

Goals

All Telco Far Edge regression tests pass prior to OCP GA
All new features that are TP or GA quality at the time of the release pass validation prior to OCP GA
Ensure Telco Far Edge KPIs are met prior to OCP GA

Requirements

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

(Optional) Use Cases

This Section:

SNO DU
C-RAN Hub of DUs on compact cluster or traditional cluster
CU on compact cluster or traditional cluster

Questions to answer…

What are the scale goals?
How many nodes must be provisioned simultaneously?

Out of Scope

Background, and strategic fit

Notes

Initial refinement meeting notes

Assumptions

Customer Considerations

Documentation Considerations

No documentation required

Epic OCPEDGE-42: Expand kernel-rt testing to include latency tests

View the Description

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

Create a new informing lane in CI which includes the functional tests from ~~OCPVE-163~~ and the latency tests from the RHEL Shift Left initiative started by the Telco RAN team.
Enable the informing lane to run on bare metal on a consistent set of hardware.

Why is this important?

We document that running workloads on OpenShift with a realtime kernel works but testing, in practice, is often done in a bespoke fashion in teams outside of OpenShift. This epic seeks to close the gap of automated integration testing OpenShift running a realtime kernel on real metal hardware.

Scenarios

https://docs.google.com/presentation/d/1NW8vEkP7zMd0vxWpD-p82srZljcAOtqLEhymuYcUQXQ/edit#slide=id.g1407d815407_0_5

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions:

What is the cost/licensing requirements for metal hardware (Equinix?) to support this new lane?
1. How many jobs do we run and for how often?
How do we integrate the metal hardware with Prow?
Who should own this lane long term?
Does OpenShift make any performance guarantees/promises?

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story OCPEDGE-548: Implement rteval test

View the Description

Implement the rteval test in the openshift-test binary under the openshift/nodes/realtime test suite

Feature TELCOSTRAT-87: Single Core CPU CaaS Budget for DU Deployment w/ Single-Node OpenShift on Sapphire Rapids Platform

View the Description

Feature Overview

Reduce the OpenShift platform and associated RH provided components to a single physical core on Intel Sapphire Rapids platform for vDU deployments on SingleNode OpenShift.

Goals

Reduce CaaS platform compute needs so that it can fit within a single physical core with Hyperthreading enabled. (i.e. 2 CPUs)
Ensure existing DU Profile components fit within reduced compute budget.
Ensure existing ZTP, TALM, Observability and ACM functionality is not affected.
Ensure largest partner vDU can run on Single Core OCP.

Requirements

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES
Provide a mechanism to tune the platform to use only one physical core.	Users need to be able to tune different platforms.	YES
Allow for full zero touch provisioning of a node with the minimal core budget configuration.	Node provisioned with SNO Far Edge provisioning method - i.e. ZTP via RHACM, using DU Profile.	YES
Platform meets all MVP KPIs		YES

(Optional) Use Cases

Main success scenario: A telecommunications provider uses ZTP to provision a vDU workload on Single Node OpenShift instance running on an Intel Sapphire Rapids platform. The SNO is managed by an ACM instance and it's lifecycle is managed by TALM.

Questions to answer...

Out of Scope

Core budget reduction on the Remote Worker Node deployment model.

Background, and strategic fit

Assumptions

The more compute power available for RAN workloads directly translates to the volume of cell coverage that a Far Edge node can support.
Telecommunications providers want to maximize the cell coverage on Far Edge nodes.
To provide as much compute power as possible the OpenShift platform must use as little compute power as possible.
As newer generations of servers are deployed at the Far Edge and the core count increases, no additional cores will be given to the platform for basic operation, all resources will be given to the workloads.

Customer Considerations

Documentation Considerations

Questions to be addressed:

What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
- Administrators must know how to tune their Far Edge nodes to make them as computationally efficient as possible.

Does this feature have doc impact?
- Possibly, there should be documentation describing how to tune the Far Edge node such that the platform uses as little compute power as possible.

New Content, Updates to existing content, Release Note, or No Doc Impact
- Probably updates to existing content

If unsure and no Technical Writer is available, please contact Content Strategy. What concepts do customers need to understand to be successful in [action]?
- Performance Addon Operator, tuned, MCO, Performance Profile Creator

How do we expect customers will use the feature? For what purpose(s)?
- Customers will use the Performance Profile Creator to tune their Far Edge nodes. They will use RHACM (ZTP) to provision a Far Edge Single-Node OpenShift deployment with the appropriate Performance Profile.

What reference material might a customer want/need to complete [action]?
- Performance Addon Operator, Performance Profile Creator

Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
- N/A

What is the doc impact (New Content, Updates to existing content, or Release Note)?
- Likely updates to existing content / unsure

Epic OCPEDGE-69: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story OCPEDGE-460: Bump api at Installer repo with new cap name

Story OCPEDGE-365: Bump api at CVO and fix tests

Story OCPEDGE-376: Annotate cloud-credential-operator manifests with cap name

Epic OCPEDGE-41: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story OCPEDGE-469: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story OCPEDGE-484: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story OCPEDGE-386: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story OCPEDGE-476: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story OCPEDGE-539: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story OCPEDGE-481: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story OCPEDGE-579: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Feature TELCOSTRAT-97: BIOS and Firmware Upgrade/Downgrade prior to Zero Touch Provisioning

View the Description

Feature Overview

Telco customers delivering servers to far edge sites (D-RAN DU) need the ability to upgraded or downgraded the servers BIOS and firmware to to specific versions to ensure the server is configured as was in their validated pattern. After delivering the bare metal to the site this should be done prior to using ZTP to provision the node.

Goals

Allow the operator, via GitOps, to specify the BIOS image and any firmware images to be installed prior to ZTP
Seamless transition from pre-provisioning to existing ZTP solution
Integrate BIOS/Firmware upgrades/downgrades into TALM
(consider) integration with backup/restore recovery feature
Firmware could include: NICs, accelerators, GPUS/DPUS/IPUS

Requirements

This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

(Optional) Use Cases

This Section:

Main success scenarios - high-level user stories
Alternate flow/scenarios - high-level user stories
...

Questions to answer…

Can we provide the ability to fail back to a well known good firmware image if the upgrade/reboot fails?

Out of Scope

Background, and strategic fit

This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.

Assumptions

Customer Considerations

Documentation Considerations

Questions to be addressed:

What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
Does this feature have doc impact?
New Content, Updates to existing content, Release Note, or No Doc Impact
If unsure and no Technical Writer is available, please contact Content Strategy.
What concepts do customers need to understand to be successful in [action]?
How do we expect customers will use the feature? For what purpose(s)?
What reference material might a customer want/need to complete [action]?
Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
What is the doc impact (New Content, Updates to existing content, or Release Note)?

Epic METAL-134: Hardware RAID support on Dell with Metal3 via Redfish

View the Description

Goal

Hardware RAID support on Dell with Metal3.

Why is this important

Before implementing generic support, we need to understand the implications of enabling an interface in Metal3 to allow it on multiple hardware types.

Scope questions

Changes in Ironic?
- Maybe, but it should be there?
Changes in Metal3?
- Hopefully small, it should be there for Fujitsu
Changes in OpenShift?
- Hopefully small, it should be there for Fujitsu
Spec/Design/Enhancements?
- Ironic: no (supports this already)
- Metal3: no (ditto)
- OpenShift:
  https://github.com/openshift/enhancements/blob/master/enhancements/baremetal/baremetal-config-raid-and-bios.md
Dependencies on other teams?
- No

Task METAL-803: Re-vendor BMO in the installer to pick up RAID changes

View the linked PRs

https://github.com/openshift/installer/pull/7809

Task METAL-829: Update cpu_arch with bmh.Spec.Architecture

View the Description View the linked PRs

While rendering BMO in https://issues.redhat.com/browse/METAL-829 the node cpu_arch was hardcoded to x86_64

We should use bmh.Spec.Architecture instead to be more future proof

https://github.com/openshift/installer/pull/7814

Story METAL-376: Re-enable Redfish RAID in the downstream BMO fork

View the linked PRs

This section includes Jira cards that are not linked to either an Epic or a Feature. These tickets were completed when this image was assembled

Bug OCPBUGS-21724: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-shared-resource-operator/pull/85

Bug OCPBUGS-22868: accessTokenInactivityTimeout field is required when configuring oauth identity providers

View the Description View the linked PRs

Description of problem:

It failed to configure oauth identity providers in the HostedCluster when accessTokenInactivityTimeout is not set

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1. Create a cluster
2. configure htpasswd without accessTokenInactivityTimeout field in the HostedCluster CR
3. it failed to apply

Actual results:

jiezhao-mac:hypershift jiezhao$ oc get hostedcluster -n clusters -o yaml > cluster.yaml 
  spec:
    configuration:
      oauth:
        identityProviders:
        - htpasswd:
            fileData:
              name: htpass-secret
          mappingMethod: claim
          name: my_htpasswd_provider
          type: HTPasswd
      secretRefs:
      - name: htpass-secret
jiezhao-mac:hypershift jiezhao$ oc apply -f cluster.yaml
The HostedCluster "jie-test" is invalid: spec.configuration.oauth: Invalid value: "object": no such key: tokenConfig evaluating rule: spec.configuration.oauth.tokenConfig.accessTokenInactivityTimeout minimum acceptable token timeout value is 300 seconds

Expected results:

htpasswd should be configured successfully without accessTokenInactivityTimeout field

Additional info:

When accessTokenInactivityTimeout it set to 300s, htpasswd is configured in the HostedCluster successfully.

jiezhao-mac:hypershift jiezhao$ oc get hostedcluster -n clusters -o yaml > cluster.yaml

  spec:
    configuration:
      oauth:
        identityProviders:
        - htpasswd:
            fileData:
              name: htpass-secret
          mappingMethod: claim
          name: my_htpasswd_provider
          type: HTPasswd
        tokenConfig:
          accessTokenInactivityTimeout: 300s
      secretRefs:
      - name: htpass-secret

jiezhao-mac:hypershift jiezhao$ oc apply -f cluster.yaml 
hostedcluster.hypershift.openshift.io/jie-test configured
jiezhao-mac:hypershift jiezhao$ 

jiezhao-mac:hypershift jiezhao$ oc get hostedcluster/jie-test -n clusters -ojsonpath='{.spec.configuration}' | jq
{
  "oauth": {
    "identityProviders": [
      {
        "htpasswd": {
          "fileData": {
            "name": "htpass-secret"
          }
        },
        "mappingMethod": "claim",
        "name": "my_htpasswd_provider",
        "type": "HTPasswd"
      }
    ],
    "tokenConfig": {
      "accessTokenInactivityTimeout": "300s"
    }
  }
}

https://github.com/openshift/hypershift/pull/3157

Bug OCPBUGS-8777: oauth-server with a single non-login identity provider creates a fail loop with console

View the Description View the linked PRs

Description of problem:
When configured with a single identity provider that's not capable of login authentication flows, the oauth-server returns error when accessed from the browser. When the oauth-server is accessed from the web console, this error causes redirect loop between the oauth-server and the console.

Version-Release number of selected component (if applicable):
4.5

How reproducible:
100%

Steps to Reproduce:
1. configure request header IdP with some bogus ChallengeURL and no LoginURL
2. disable the kubeadmin user by deleting the kube-system/kubeadmin secret
3. wait for the changes to be applied to the oauth-server's deployment
4. go to the console's URL

Actual results:
The console tries to access a resource, gets "unauthorized" error, redirects user to the oauth-server, the oauth-server errors out because it does not allow browser login, redirects user to console, and the loop repeats infinitely.

Expected results:
The oauth-server presents the user with a login page that won't allow them to log in OR the server errors out with a clear error that tells the console not to try to loop back to it again.

https://github.com/openshift/console/pull/13102

Bug OCPBUGS-25725: ManagedBootImages: failed to fetch architecture type of machineset no linked machine found

View the Description View the linked PRs

Description of problem:

When I test PR https://github.com/openshift/machine-config-operator/pull/4083, there is no machineset does not have any machine linked. 

$ oc get machineset/rioliu-1220c-bz2gp-worker-f -n openshift-machine-api
NAME                          DESIRED   CURRENT   READY   AVAILABLE   AGE
rioliu-1220c-bz2gp-worker-f   0         0                             3h47m

Many errors found in MCD log like below

I1220 09:15:59.743704       1 machine_set_boot_image_controller.go:211] Error syncing machineset openshift-machine-api/rioliu-1220c-bz2gp-worker-f: failed to fetch architecture type of machineset rioliu-1220c-bz2gp-worker-f, err: could not find any machines linked to machineset, error: %!w(<nil>)

the machineset patch is skipped in reconcile loop due to above error, boot image info cannot be patched even it does not have any machine provisioned.

Version-Release number of selected component (if applicable):

How reproducible:

Consistently

Steps to Reproduce:

https://github.com/openshift/machine-config-operator/pull/4083#issuecomment-1864226629

Actual results:

the machineset is skipped in reconcile loop due to above error, boot image info cannot be patched

Expected results:

the machineset should be updated even no linked machine found, because maybe it is scaled down to 0 replica

Additional info:

https://github.com/openshift/machine-config-operator/pull/4088

Bug MGMT-15598: System generated manifests are not gathered by assisted-test-infra

View the Description View the linked PRs

Description of the problem:
When gathering manifests for a cluster from assisted-installer using assisted-test-infra any 'system generated' manifests are not listed.

How reproducible:
Look at any triage ticket that has recently been created, you will notice that the `system-generated` manifests are missing.

Actual results:
Only user-generated manifests are shown by assisted-test-infra

Expected results:
System generated manifests as well as user generated manifests should be listed by assisted-test-infra

https://github.com/openshift/assisted-service/pull/5498

Bug OCPBUGS-19258: Update 4.15 egress-router-cni image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/egress-router-cni/pull/76

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/egress-router-cni/pull/76

Bug OCPBUGS-24116: Update 4.15 ose-machine-api-provider-aws-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-aws/pull/93

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-aws/pull/93

Bug OCPBUGS-26605: e2e-gcp-op-layering CI job continuously failing

View the Description View the linked PRs

Description of problem:

The e2e-gcp-op-layering CI job seems to be continuously and consistently failing during the teardown process. In particular, it appears to be the TestOnClusterBuildRollsOutImage test that is failing whenever it attempts to tear down the node. See: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/4060/pull-ci-openshift-machine-config-operator-master-e2e-gcp-op-layering/1744805949165539328 for an example of a failing job.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

Open a PR to the GitHub MCO repository.

Actual results:

The teardown portion of the TestOnClusterBuildsRollout test fails thusly:

  utils.go:1097: Deleting machine ci-op-v5qcditr-46b3f-bh29c-worker-c-fcl9f / node ci-op-v5qcditr-46b3f-bh29c-worker-c-fcl9f
    utils.go:1098: 
            Error Trace:    /go/src/github.com/openshift/machine-config-operator/test/helpers/utils.go:1098
                                        /go/src/github.com/openshift/machine-config-operator/test/e2e-layering/onclusterbuild_test.go:103
                                        /go/src/github.com/openshift/machine-config-operator/test/e2e-layering/helpers_test.go:149
                                        /go/src/github.com/openshift/machine-config-operator/test/helpers/utils.go:79
                                        /usr/lib/golang/src/testing/testing.go:1150
                                        /usr/lib/golang/src/testing/testing.go:1328
                                        /usr/lib/golang/src/testing/testing.go:1570
            Error:          Received unexpected error:
                            exit status 1
            Test:           TestOnClusterBuildRollsOutImage
    utils.go:1097: Deleting machine ci-op-v5qcditr-46b3f-bh29c-worker-c-fcl9f / node ci-op-v5qcditr-46b3f-bh29c-worker-c-fcl9f
    utils.go:1098: 
            Error Trace:    /go/src/github.com/openshift/machine-config-operator/test/helpers/utils.go:1098
                                        /go/src/github.com/openshift/machine-config-operator/test/e2e-layering/onclusterbuild_test.go:103
                                        /go/src/github.com/openshift/machine-config-operator/test/e2e-layering/helpers_test.go:149
                                        /go/src/github.com/openshift/machine-config-operator/test/helpers/utils.go:79
                                        /usr/lib/golang/src/testing/testing.go:1150
                                        /usr/lib/golang/src/testing/testing.go:1328
                                        /usr/lib/golang/src/testing/testing.go:1312
                                        /usr/lib/golang/src/runtime/panic.go:522
                                        /usr/lib/golang/src/testing/testing.go:980
                                        /go/src/github.com/openshift/machine-config-operator/test/helpers/utils.go:1098
                                        /go/src/github.com/openshift/machine-config-operator/test/e2e-layering/onclusterbuild_test.go:103
                                        /go/src/github.com/openshift/machine-config-operator/test/e2e-layering/helpers_test.go:149
                                        /go/src/github.com/openshift/machine-config-operator/test/helpers/utils.go:79
                                        /usr/lib/golang/src/testing/testing.go:1150
                                        /usr/lib/golang/src/testing/testing.go:1328
                                        /usr/lib/golang/src/testing/testing.go:1570
            Error:          Received unexpected error:
                            exit status 1
            Test:           TestOnClusterBuildRollsOutImage

Expected results:

This part of the test should pass.

Additional info:

The way the test teardown process currently works is that it shells out to the oc command to delete the underlying Machine and Node. We delete the underlying machine and node so that the cloud provider will provision us a new one due to issues with opting out of on-cluster builds that have yet to be resolved.

At the time this test was written, it was implemented in this way to avoid having to vendor the Machine client and API into the MCO codebase which has since happened. I suspect the issue is that oc is failing in some way since we get an exit status 1 from where it is invoked. Now that the Machine client and API are vendored into the MCO codebase, it makes more sense for us to use those directly instead of shelling out to oc in order to do this since we would get more verbose error messages instead.

https://github.com/openshift/machine-config-operator/pull/4110

Bug OCPBUGS-19287: Update 4.15 openshift-enterprise-haproxy-router image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/router/pull/513

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/router/pull/513

Bug OCPBUGS-21830: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/308

Bug OCPBUGS-24787: Update 4.16 ose-cluster-samples-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-samples-operator/pull/527

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Story GITOPS-3575: [DynamicPlugin] Set a flag to toggle static plugin

View the Description View the linked PRs

Story (Required)

As a developer trying to release GitOps Dynamic Plugin I want to have a flag to toggle static plugin so that it would be possible to backport to old static plugin.

Background (Required)

The reason of this ticket is that OCP will have a release where they leave the static plugin as a fallback.

Slack thread: https://redhat-internal.slack.com/archives/C011BL0FEKZ/p1698853635030619

Related to ~~GITOPS-2369~~: [DynamicPlugin] Remove static plugin from Console

Out of scope

<Defines what is not included in this story>

Approach (Required)

Set up a flag initialized by the dynamic plugin and disable the static plugin when the flag is set.

Dependencies

<Describes what this story depends on. Dependent Stories and EPICs should be linked to the story.>

Acceptance Criteria (Mandatory)

Only one of static plugin and dynamic plugin will be displayed in console.

INVEST Checklist

Dependencies identified

Blockers noted and expected delivery timelines set

Design is implementable

Acceptance criteria agreed upon

Story estimated

Legend

Unknown

Verified

Unsatisfied

Done Checklist

Code is completed, reviewed, documented and checked in
Unit and integration test automation have been delivered and running cleanly in continuous integration/staging/canary environment
Continuous Delivery pipeline(s) is able to proceed with new code included
Customer facing documentation, API docs etc. are produced/updated, reviewed and published
Acceptance criteria are met

https://github.com/openshift/console/pull/13307

Bug OCPBUGS-24125: Update 4.15 ose-cluster-baremetal-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-baremetal-operator/pull/392

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-baremetal-operator/pull/392

Bug OCPBUGS-26068: IP and CIDR CEL validation for OpenShift 4.15

View the Description View the linked PRs

Description of problem:

We would like to include the CEL IP and CIDR validations in 4.16. They have been mergeded upstream and can be backported into OpenShift to improve out validation downstream.

Upstream PR: https://github.com/kubernetes/kubernetes/pull/121912

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/kubernetes/pull/1843

Bug OCPBUGS-26406: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-operator/pull/1196

Bug OCPBUGS-19179: Update 4.15 ose-machine-config-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-config-operator/pull/3919

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-config-operator/pull/3919

Bug OCPBUGS-23862: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-vpc-block-csi-driver-operator/pull/88

Bug OCPBUGS-23737: Hypershift requires access to cluster-machine-approver metrics

View the Description View the linked PRs

Description of problem:
OCPCLOUD-2277 restricted access to the cma metrics. This led to a regression in hypershift e2e tests. Long term is likely for hypershift to remove that dependency but to get things working again we plan to revert the cma change until the dependency can be removed.

PR removing the probes from hypershift is being worked on.

https://github.com/openshift/hypershift/pull/3227

Story HOSTEDCP-1200: Remove no crashing pod test exceptions

View the Description View the linked PRs

User Story:

We should remove all exceptions added over time to https://github.com/openshift/hypershift/blob/860064d33f4729c2db3c68722d0b5a633e6d1bcd/test/e2e/util/util.go#L414

https://github.com/openshift/hypershift/pull/3138

Bug OCPBUGS-17534: cgroupv2 memory calculation is not accounted correctly

View the Description View the linked PRs

Description of problem:

https://github.com/kubernetes/kubernetes/issues/118916

Version-Release number of selected component (if applicable):

4.14

How reproducible:

100%

Steps to Reproduce:

1. compare memory usage from v1 and v2 and notice differences with the same workloads
2.
3.

Actual results:

they slightly differ because of accounting differences

Expected results:

they should be largely the same

Additional info:

https://github.com/openshift/kubernetes/pull/1711

Bug OCPBUGS-19546: YAML editor shows different style in console for configmaps with data exceeding 78 Characters

View the Description View the linked PRs

Testcases:

1. Create a configmap from a file with 77 characters in a line

File data:
tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt

CLI data:

$ oc get cm cm-test4 -o yaml
apiVersion: v1
data:
  cm-test4: |                                                                              ##Noticed the Literal style
    tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt
    eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
    ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
    tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt
kind: ConfigMap
metadata:
  creationTimestamp: "2022-09-28T12:39:43Z"
  name: cm-test4
  namespace: configmap-test
  resourceVersion: "8962738"
  uid: cf0e264b-72fb-4df7-bd3a-f3ed62423367


UI data:

kind: ConfigMap
apiVersion: v1
metadata:
  name: cm-test4
  namespace: configmap-test
  uid: cf0e264b-72fb-4df7-bd3a-f3ed62423367
  resourceVersion: '8962738'
  creationTimestamp: '2022-09-28T12:39:43Z'
  managedFields:
    - manager: kubectl-create
      operation: Update
      apiVersion: v1
      time: '2022-09-28T12:39:43Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:data':
          .: {}
          'f:cm-test4': {}
data:
  cm-test4: |                                                                      ##Noticed the Literal style
    tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt
    eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
    ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
    tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt

2. Create a configmap from a file with characters more than 78 in a line,

File Data:
tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt

CLI Data:

$ oc get cm cm-test5 -o yaml
apiVersion: v1
data:
  cm-test5: |                                                                              ##Noticed the Literal style
    tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt
    eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
    ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
    tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt
kind: ConfigMap
metadata:
  creationTimestamp: "2022-09-28T12:39:54Z"
  name: cm-test5
  namespace: configmap-test
  resourceVersion: "8962813"
  uid: b8b12653-588a-4afc-8ed9-ff7c6ebaefb1

UI data:

kind: ConfigMap
apiVersion: v1
metadata:
  name: cm-test5
  namespace: configmap-test
  uid: b8b12653-588a-4afc-8ed9-ff7c6ebaefb1
  resourceVersion: '8962813'
  creationTimestamp: '2022-09-28T12:39:54Z'
  managedFields:
    - manager: kubectl-create
      operation: Update
      apiVersion: v1
      time: '2022-09-28T12:39:54Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:data':
          .: {}
          'f:cm-test5': {}
data:
  cm-test5: >                                                                         ##Noticed the Folded style and newlines in between data
    tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt

    eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee

    ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss

    tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt

Conclusion:

When the CM is created with more than 78 characters in a single line the yaml editor in the web UI changes the style to folded and could see newline in between data.

https://github.com/openshift/console/pull/13182

Bug OCPBUGS-26591: [release-4.15] Web Console Shows Non-printable file detected

View the Description View the linked PRs

This is a clone of issue OCPBUGS-18699. The following is the description of the original issue:
—
Description of problem:

Openshift Console shows "Info alert:Non-printable file detected. File contains non-printable characters. Preview is not available." while edit an XML file type configmaps.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Create configmap from file:
# oc create cm test-cm --from-file=server.xml=server.xml
configmap/test-cm created

2. If we try to edit the configmap in the OCP console we see the following error:

Info alert:Non-printable file detected.
File contains non-printable characters. Preview is not available.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13499

Bug OCPBUGS-19165: Update 4.15 ose-nutanix-cloud-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-nutanix/pull/19

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-nutanix/pull/19

Bug OCPBUGS-19909: Build timing tests failing due to faster run times

View the Description View the linked PRs

Description of problem:

Build timing test is failing due to faster run times on Bare Metal

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. run [sig-builds][Feature:Builds][timing] capture build stages and durations should record build stages and durations for docker 2.
3.

Actual results:

{  fail [github.com/openshift/origin/test/extended/builds/build_timing.go:101]: Stage PushImage ran for 95, expected greater than 100ms
Expected
    <bool>: true
to be false
Ginkgo exit error 1: exit with code 1}

Expected results:

Test should pass

Additional info:

https://github.com/openshift/origin/pull/28288

Bug OCPBUGS-18864: Update 4.15 ironic-static-ip-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ironic-static-ip-manager/pull/40

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ironic-static-ip-manager/pull/40

Bug OCPBUGS-25367: OLM pod panics when EnsureSecretOwnershipAnnotations runs

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/operator-framework-olm/pull/636

Bug OCPBUGS-11710: Connection problems with OVN-Kubernetes on OpenShift Container Platform 4.12 on AWS post hibernation

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

Is it an

internal CI failure
customer issue / SD
internal RedHat testing failure

If it is an internal RedHat testing failure:

Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

If it is a CI failure:

Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
If it's a connectivity issue,
What is the srcNode, srcIP and srcNamespace and srcPodName?
What is the dstNode, dstIP and dstNamespace and dstPodName?
What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

If it is a customer / SD issue:

Provide enough information in the bug description that Engineering doesn’t need to read the entire case history.
Don’t presume that Engineering has access to Salesforce.
Please provide must-gather and sos-report with an exact link to the comment in the support case with the attachment. The format should be: https://access.redhat.com/support/cases/#/case/<case number>/discussion?attachmentId=<attachment id>
Describe what each attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
- If the issue is in a customer namespace then provide a namespace inspect.
- If it is a connectivity issue:
  - What is the srcNode, srcNamespace, srcPodName and srcPodIP?
  - What is the dstNode, dstNamespace, dstPodName and dstPodIP?
  - What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
  - Please provide the UTC timestamp networking outage window from must-gather
  - Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
- If it is not a connectivity issue:
  - Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.

For OCPBUGS in which the issue has been identified, label with “sbr-triaged”
For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, labels with “sbr-untriaged”
Note: bugs that do not meet these minimum standards will be closed with label “SDN-Jira-template”

https://github.com/openshift/ovn-kubernetes/pull/1939

Bug OCPBUGS-23131: IPv6 BMC cannot reach image on provisioning network

View the Description View the linked PRs

The final iteration (of 3) of the fix for ~~OCPBUGS-4248~~ - https://github.com/openshift/cluster-baremetal-operator/pull/341 - uses the (IPv6) API VIP as the IP address for IPv6 BMCs to contact Apache to download the image to mount via virtualmedia.

When the provisioning network is active, this should use the (IPv6) Provisioning VIP unless the virtualMediaViaExternalNetwork flag is true.

https://github.com/openshift/cluster-baremetal-operator/pull/380

Bug OCPBUGS-27155: Switch to using new image for KAS container bootstrap

View the Description View the linked PRs

Description of problem:

Manifests will be removed from CCO image so we have to start using CCA(cluster-config-api) image for bootstrap

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

  KAS bootstrap container fails

Expected results:

    KAS bootstrap container suceeds

Additional info:

https://github.com/openshift/hypershift/pull/3423

Bug OCPBUGS-24008: Agent integration test sno_arm fails

View the Description View the linked PRs

The sno_arm.txt integration test fails because it tries to extract arm64 pxe bits from the OKD release payload that is x86_64.

AC:
skip or remove the sno_arm.txt test.

https://github.com/openshift/installer/pull/7718

Bug OCPBUGS-26434: Ensure Passwords are Redacted in Agent Gather manifest Files

View the Description View the linked PRs

When platform specific passwords are included in the install-config.yaml they are stored in the generated agent-cluster-install.yaml, which is included in the output of the agent-gather command. These passwords should be redacted.

https://github.com/openshift/installer/pull/7873

Task OSASINFRA-3236: Speed up container deletion in cluster-destroy

View the Description View the linked PRs

Work has been done in Gophercloud; we now need to bump Gophercloud in Installer.

https://github.com/openshift/installer/pull/7208

Task MON-3286: Remove legacy code for 4.13->4.14 etcd ServiceMonitor migration

View the Description View the linked PRs

The code https://github.com/openshift/cluster-monitoring-operator/blob/91d735bd8662965037aae60c846c53baa79752ac/pkg/tasks/controlplane.go#L79-L93 makes sure CMO delete the resources it used to manage.

The code was temporarily added in https://github.com/openshift/cluster-monitoring-operator/pull/2039/files

https://github.com/openshift/cluster-monitoring-operator/pull/2116

Bug OCPBUGS-24339: Husky pre-commit task fails after latest update

View the Description View the linked PRs

After updating our husky dependency, the pre-commit hook might fail on some systems if their PATH env var is not properly configured:
{{}}

Running husky pre-commit hook...
frontend/.husky/pre-commit: line 6: lint-staged: command not found
husky - pre-commit hook exited with code 127 (error)
husky - command not found in PATH=<user path>

The PATH env var must include "./node_modules/.bin" for the husky pre-commit hook to work, which should be documented in the README.

Bug OCPBUGS-25605: pinned packages in ironic-image breaks ART pipeline

View the Description View the linked PRs

because of the pin in the packages list the ART pipeline is rebuilding packages all the time
unfortunately we need to remove the strong pins and move back to relaxed ones

once that's done we need to merge https://github.com/openshift-eng/ocp-build-data/pull/4097

https://github.com/openshift/ironic-image/pull/441

Bug OCPBUGS-21754: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openshift-state-metrics/pull/103

Bug OCPBUGS-24092: Update 4.15 ose-openstack-cloud-controller-manager-container image to be consistent with ART

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cloud-provider-openstack/pull/248

Bug OCPBUGS-26223: PKI Operator Starts Even When Hosted Cluster Is Annoated To Turn Off PKI

View the Description View the linked PRs

This is a clone of issue OCPBUGS-26197. The following is the description of the original issue:
—
Description of problem:

    pki operator runs even when annotation to turn off PKI is on the hosted control plane

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3376

Bug OCPBUGS-17542: [Azure-Disk-CSI-Driver] Message correction for "The performancePlus flag can only be set on disks at least 512 GB in size"

View the Description View the linked PRs

Description of problem:

When using the performancePlus in storageclass in azure-disk-csi-driver, it asks for volume size large than 512GB, but the message shows "The performancePlus flag can only be set on disks at least 512 GB in size" which means 512 is supported. It will make confuse to users.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Create sc with 
parameters:
  enablePerformancePlus: "true" 

2. Create pvc with 512Gi 

3. Get ProvisioningFailed message as below which is a bit confused:  
  Warning  ProvisioningFailed    <invalid> (x5 over <invalid>)  disk.csi.azure.com_wduan0810manual-b5dng-master-1_d7a29bbf-3f49-4207-af33-056e0814f6e2  failed to provision volume with StorageClass "managed-csi-test-28-sssdlrs-enableperformanceplus": rpc error: code = Internal desc = Retriable: false, RetryAfter: 0s, HTTPStatusCode: 400, RawError: {
  "error": {
    "code": "BadRequest",
    "message": "The performancePlus flag can only be set on disks at least 512 GB in size."
  }
}

Actual results:

Expected results:

Message should mention larger than 512GB, but not "at least".

Additional info:

https://github.com/openshift/azure-disk-csi-driver/pull/52

Bug OCPBUGS-19227: Update 4.15 ose-kube-storage-version-migrator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kubernetes-kube-storage-version-migrator/pull/199

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kubernetes-kube-storage-version-migrator/pull/199

Bug OCPBUGS-22912: the value of ELB subnet tag should be 1 or empty, not true

View the Description View the linked PRs

Description of problem:

To make AWS Load Balancer Operator work on HyperShift, one of the requirements is the ELB tag should be set on subnets. see https://github.com/openshift/aws-load-balancer-operator/blob/main/docs/prerequisites.md#vpc-and-subnets   

The value of `kubernetes.io/role/elb` or `kubernetes.io/role/internal-elb`should be 1 or ``. 

but from the code below, hypershift uses "true"  

https://github.com/openshift/hypershift/blob/3e1db35d562d069797f9dec2b47227744f689684/cmd/infra/aws/ec2.go#L226

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1. install hypershift cluster
2. check subnet tags
3.

Actual results:

value of `kubernetes.io/role/elb` is "true"

Expected results:

value of `kubernetes.io/role/elb` is 1 or ``

Additional info:

https://github.com/openshift/hypershift/pull/3198

Bug OCPBUGS-24084: Update 4.15 ose-gcp-pd-csi-driver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/gcp-pd-csi-driver/pull/51

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/gcp-pd-csi-driver/pull/52

Bug OCPBUGS-19198: Update 4.15 ose-vmware-vsphere-csi-driver-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/170

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-19220: Update 4.15 vmware-vsphere-syncer image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/vmware-vsphere-csi-driver/pull/86

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/vmware-vsphere-csi-driver/pull/86

Bug OCPBUGS-19528: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-etcd-operator/pull/1126

Bug MGMT-15653: [BE] Domain with double -- (cat--rahul.com) rejected in network validation

View the Description View the linked PRs

Description of the problem:

Base domain contains double `–` like cat–rahul.com allowed by UI and BE and when node discovered , network validation fails.

Current domain is a private case for using – but note that UI and BE allows to send many – chars as part of domain name.

from agent logs:

Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Creating execution step for ntp-synchronizer ntp-synchronizer-70565cf4 args <[{\"ntp_source\":\"\"}]>" file="step_processor.go:123" request_id=5467e025-2683-4119-a55a-976bb7787279
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Creating execution step for domain-resolution domain-resolution-f3917dea args <[{\"domains\":[{\"domain_name\":\"api.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"api-int.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"console-openshift-console.apps.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"validateNoWildcardDNS.dummy---dummy.cat--rahul.com.\"},{\"domain_name\":\"validateNoWildcardDNS.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"quay.io\"}]}]>" file="step_processor.go:123" request_id=5467e025-2683-4119-a55a-976bb7787279
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Validating domain resolution with args [{\"domains\":[{\"domain_name\":\"api.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"api-int.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"console-openshift-console.apps.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"validateNoWildcardDNS.dummy---dummy.cat--rahul.com.\"},{\"domain_name\":\"validateNoWildcardDNS.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"quay.io\"}]}]" file="action.go:29"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Validating inventory with args [fea3d7b9-a990-48a6-9a46-4417915072b0]" file="action.go:29"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=error msg="Failed to validate domain resolution: data, {\"domains\":[{\"domain_name\":\"api.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"api-int.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"console-openshift-console.apps.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"validateNoWildcardDNS.dummy---dummy.cat--rahul.com.\"},{\"domain_name\":\"validateNoWildcardDNS.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"quay.io\"}]}" file="action.go:42" error="validation failure list:\nvalidation failure list:\ndomains.0.domain_name in body should match '^([a-zA-Z0-9]+(-[a-zA-Z0-9]+)*[.])+[a-zA-Z]{2,}[.]?$'"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Validating ntp synchronizer with args [{\"ntp_source\":\"\"}]" file="action.go:29"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Validating free addresses with args [[\"192.168.123.0/24\"]]" file="action.go:29"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Executing nsenter [--target 1 --cgroup --mount --ipc --net -- sh -c cp /etc/mtab /root/mtab-fea3d7b9-a990-48a6-9a46-4417915072b0 && podman run --privileged --pid=host --net=host --rm --quiet -v /var/log:/var/log -v /run/udev:/run/udev -v /dev/disk:/dev/disk -v /run/systemd/journal/socket:/run/systemd/journal/socket -v /var/log:/host/var/log:ro -v /proc/meminfo:/host/proc/meminfo:ro -v /sys/kernel/mm/hugepages:/host/sys/kernel/mm/hugepages:ro -v /proc/cpuinfo:/host/proc/cpuinfo:ro -v /root/mtab-fea3d7b9-a990-48a6-9a46-4417915072b0:/host/etc/mtab:ro -v /sys/block:/host/sys/block:ro -v /sys/devices:/host/sys/devices:ro -v /sys/bus:/host/sys/bus:ro -v /sys/class:/host/sys/class:ro -v /run/udev:/host/run/udev:ro -v /dev/disk:/host/dev/disk:ro registry-proxy.engineering.redhat.com/rh-osbs/openshift4-assisted-installer-agent-rhel8:v1.0.0-279 inventory]" file="execute.go:39"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=error msg="Unable to create runner for step <domain-resolution-f3917dea>, args <[{\"domains\":[{\"domain_name\":\"api.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"api-int.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"console-openshift-console.apps.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"validateNoWildcardDNS.dummy---dummy.cat--rahul.com.\"},{\"domain_name\":\"validateNoWildcardDNS.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"quay.io\"}]}]>" file="step_processor.go:126" error="validation failure list:\nvalidation failure list:\ndomains.0.domain_name in body should match '^([a-zA-Z0-9]+(-[a-zA-Z0-9]+)*[.])+[a-zA-Z]{2,}[.]?$'" request_id=5467e025-2683-4119-a55a-976bb7787279
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Executing nsenter [--target 1 --cgroup --mount --ipc --net -- findmnt --raw --noheadings --output SOURCE,TARGET --target /run/media/iso]" file="execute.go:39"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Executing nsenter [--target 1 --cgroup --mount --ipc --net -- sh -c podman ps --format '{{.Names}}' | grep -q '^free_addresses_scanner$' || podman run --privileged --net=host --rm --quiet --name free_addresses_scanner -v /var/log:/var/log -v /run/systemd/journal/socket:/run/systemd/journal/socket registry-proxy.engineering.redhat.com/rh-osbs/openshift4-assisted-installer-agent-rhel8:v1.0.0-279 free_addresses '[\"192.168.123.0/24\"]']" file="execute.go:39"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Executing nsenter [--target 1 --cgroup --mount --ipc --net -- timeout 30 chronyc -n sources]" file="execute.go:39"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=warning msg="Sending step <domain-resolution-f3917dea> reply output <> error <validation failure list:\nvalidation failure list:\ndomains.0.domain_name in body should match '^([a-zA-Z0-9]+(-[a-zA-Z0-9]+)*[.])+[a-zA-Z]{2,}[.]?$'> exit-code <-1>" file="step_processor.go:76" request_id=5467e025-2683-4119-a55a-976bb7787279

How reproducible:

Create a cluster with domain cat–rahul.com with UI fix that allowing it.

Once node discovered , network validation fails on :

DNS wildcard not configured: DNS wildcard check cannot be performed yet because the host has not yet performed DNS resolution.

Steps to reproduce:

see above

Actual results:

Unable to install cluster due to network validation failure

Expected results:
The domain should be allowed in regex

Bug OCPBUGS-12092: Update 4.14 ose-openstack-cinder-csi-driver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-openstack/pull/195

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-openstack/pull/195

Bug OCPBUGS-17282: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-azure/pull/84

Bug OCPBUGS-22195: Capi provider reconciliation from old versions might fail to reconcile

View the Description View the linked PRs

Description of problem:

https://redhat-internal.slack.com/archives/C061SJRTKDG/p1697798046548799
In some ocm envs the latest HO is stuck onreconciliating CAPI provider for some 4.12 HCs

{"level":"error","ts":"2023-10-20T10:53:27Z","msg":"Reconciler error","controller":"hostedcluster","controllerGroup":"hypershift.openshift.io","controllerKind":"HostedCluster","HostedCluster":{"name":"build08","namespace":"ocm-production-23qm3j1pkslelghufgs874g86ccn5sba"},"namespace":"ocm-production-23qm3j1pkslelghufgs874g86ccn5sba","name":"build08","reconcileID":"482f297f-8afb-407c-96d9-bc1de727ef78","error":"failed to reconcile capi provider: failed to reconcile capi provider deployment: Deployment.apps \"capi-provider\" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"app\":\"capi-provider-controller-manager\", \"control-plane\":\"capi-provider-controller-manager\", \"hypershift.openshift.io/control-plane-component\":\"capi-provider-controller-manager\"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/opt/app-root/src/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:326\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/opt/app-root/src/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/opt/app-root/src/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234"}

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

reconciliation is tuck

Expected results:

reconciliation succeeds

Additional info:

https://github.com/openshift/hypershift/pull/3108

Bug OCPBUGS-23770: After PatternFly5 update: Typology sidebar layout issue

View the Description View the linked PRs

Issue 44 from https://docs.google.com/spreadsheets/d/1TR3ENY-GE_LQL9F-xH6NahtRHdu6IrYGhLqDDA8y0EI/edit#gid=1035185624

The Observe tab has Metric and Events within an Accordian component blue border is against the side bar container. Either remove it (currently) or add spacing between

Screenshot: https://drive.google.com/file/d/1i8SMUwTYXZL4CG0r1UXnxnm5e8QdAhQK/view?usp=sharing

https://github.com/openshift/console/pull/13363

Bug OCPBUGS-23921: CCM uses MC's KAS instead of HC's KAS

View the Description View the linked PRs

Description of problem:

    The way CCM is deployed, it gets the kubeconfig configuration from the environment it runs on, which is the Management cluster. Thus, it communicates with the Kubernetes Api Server (KAS) of the Management Cluster (MC) instead of the KAS of the Hosted Cluster it is part of.

Version-Release number of selected component (if applicable):

    4.15.0

How reproducible:

    100%

Steps to Reproduce:

    1. Deploy a hosted cluster
    2. oc debug to the node running the HC CCM
    3. crictl ps -a to list all the containers
    4. crictl inspect X  # Where X is the container id of the CCM container
    5. nsenter -n -t pid_of_ccm_container
    6. tcpdump

Actual results:

    Communication goes to MC KAS

Expected results:

    Communication goes to HC KAS

Additional info:

https://github.com/openshift/hypershift/pull/3222

Bug OCPBUGS-19394: Document usage of kebab menu in the TableData component

View the Description View the linked PRs

Description of problem:

We should document how to preserve kebab menu in the TableData component when building a list page for a dynamic plugin.
Currently {className: "pf-c-table__action", id: ""} need to be set on the component in order for the column to be preserved, which is definitely not obvious for plugin creators.
There is also an upstream issue which should address this issue, either with making the setting more obvious or at least better documented.
Either way we should be documenting the current state in our docs/code/examples.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13177

Bug OCPBUGS-22476: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/183

Bug OCPBUGS-26555: Power VS: machine-api is unable to launch VMs in new Power VS regions

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

    4.15

How reproducible:

Easily

Steps to Reproduce:

    1. Deploy in wdc with 4.15
    2. Observe that workers don't launch
    3. Installer fails

Actual results:

    worker nodes will not launch

Expected results:

    install completes

Additional info:

https://github.com/openshift/machine-api-provider-powervs/pull/72

Bug OCPBUGS-20105: HyperShift Operator does not guarantee that there are two nodes with labels for serving nodes

View the Description View the linked PRs

Description of problem:

The HyperShift Operator does not guarantee that two request serving nodes will be labeled with the HCP's namespace-name. It is likely that it labels the nodes initially and then doesn't notice if the nodes get deleted by something else.

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1. Create a HCP with dedicated request serving nodes
2. Delete one of the request serving nodes (via deleting the node directly or its machine)
3. Observe that the replacement node does not have the required label for scheduling its request-serving pods

Actual results:

HCP's can exist without two nodes labeled with the HCP's name, causing the kube-apiserver pods to be unschedulable

❯ k get no -lhypershift.openshift.io/cluster=ocm-staging-26ljge23ub1112ve884u0opvkj2c4lpc-perf-rhcp-0012
NAME                                        STATUS   ROLES    AGE   VERSION
ip-10-0-34-188.us-east-2.compute.internal   Ready    worker   9h    v1.27.6+1648878

❯ k get po -n ocm-staging-26ljge23ub1112ve884u0opvkj2c4lpc-perf-rhcp-0012 -lapp=kube-apiserver -owide   
NAME                             READY   STATUS    RESTARTS   AGE    IP             NODE                                        NOMINATED NODE   READINESS GATES
kube-apiserver-54854bcb7-v88dq   0/5     Pending   0          151m   <none>         <none>                                      <none>           <none>
kube-apiserver-54854bcb7-x5jqt   5/5     Running   0          3h2m   10.128.236.6   ip-10-0-34-188.us-east-2.compute.internal   <none>           <none>

Expected results:

Every HCP has two nodes labeled with the HCP's name

❯ k get po -n ocm-staging-26ljip0ck3d2i1bejp2sipio4okhgttn-perf-rhcp-0017 -l app=kube-apiserver -owide
NAME                            READY   STATUS    RESTARTS   AGE    IP             NODE                                        NOMINATED NODE   READINESS GATES
kube-apiserver-5f85cd4b-l57qr   5/5     Running   0          169m   10.128.218.6   ip-10-0-114-35.us-east-2.compute.internal   <none>           <none>
kube-apiserver-5f85cd4b-lqfsx   5/5     Running   0          169m   10.128.129.6   ip-10-0-59-232.us-east-2.compute.internal   <none>           <none>

❯ k get no -lhypershift.openshift.io/cluster=ocm-staging-26ljip0ck3d2i1bejp2sipio4okhgttn-perf-rhcp-0017
NAME                                        STATUS   ROLES    AGE    VERSION
ip-10-0-114-35.us-east-2.compute.internal   Ready    worker   24h    v1.27.6+1648878
ip-10-0-59-232.us-east-2.compute.internal   Ready    worker   5d2h   v1.27.6+1648878

Additional info:

https://github.com/openshift/hypershift/pull/3077

Bug OCPBUGS-22319: invalid memory address or nil pointer dereference in MAPO/CAPO v1alpha7

View the Description View the linked PRs

Description of problem:

Impossible to create NFV workers

Version-Release number of selected component (if applicable):

4.15 (current master)

Actual results:

I1024 02:36:28.388445       1 controller.go:156] sj6vp0y3-56ae0-2f4wl-worker-0-ph4nw: reconciling Machine
I1024 02:36:29.068382       1 controller.go:349] sj6vp0y3-56ae0-2f4wl-worker-0-ph4nw: reconciling machine triggers idempotent create
I1024 02:36:31.426442       1 controller.go:115]  "msg"="Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference" "controller"="machine-controller" "name"="sj6vp0y3-56ae0-2f4wl-worker-0-ph4nw" "namespace"="openshift-machine-api" "object"={"name":"sj6vp0y3-56ae0-2f4wl-worker-0-ph4nw","namespace":"openshift-machine-api"} "reconcileID"="1041b0ba-067a-4e94-8a2a-f71f46821275"
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x27c49ff]

goroutine 247 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
	/go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:116 +0x1fa
panic({0x2a72f60, 0x430deb0})
	/usr/lib/golang/src/runtime/panic.go:884 +0x213
github.com/openshift/machine-api-provider-openstack/pkg/machine.MachineToInstanceSpec(0xc0006698c0, {0xc000a49940, 0x1, 0x4}, {0xc000a49980, 0x1, 0x4}, {0xc00029aa00, 0x6a6}, {0x30ab820, ...}, ...)
	/go/src/sigs.k8s.io/cluster-api-provider-openstack/pkg/machine/convert.go:317 +0xb9f
github.com/openshift/machine-api-provider-openstack/pkg/machine.(*OpenstackClient).convertMachineToCapoInstanceSpec(0xc0000f11f0, {0x30cb3b0, 0xc000c50b80}, 0xc0006698c0)
	/go/src/sigs.k8s.io/cluster-api-provider-openstack/pkg/machine/actuator.go:157 +0x23b
github.com/openshift/machine-api-provider-openstack/pkg/machine.(*OpenstackClient).createInstance(0xc0000f11f0, {0xc000c50b80?, 0xc00072a1b0?}, 0xc0006698c0, {0x30cb3b0, 0xc000c50b80})
	/go/src/sigs.k8s.io/cluster-api-provider-openstack/pkg/machine/actuator.go:246 +0x137
github.com/openshift/machine-api-provider-openstack/pkg/machine.(*OpenstackClient).reconcile(0xc0000f11f0, {0x30c5530, 0xc00072a1b0}, 0xc0006698c0)
	/go/src/sigs.k8s.io/cluster-api-provider-openstack/pkg/machine/actuator.go:201 +0x23e
github.com/openshift/machine-api-provider-openstack/pkg/machine.(*OpenstackClient).Create(0xc000a42150?, {0x30c5530?, 0xc00072a1b0?}, 0x0?)
	/go/src/sigs.k8s.io/cluster-api-provider-openstack/pkg/machine/actuator.go:172 +0x25
github.com/openshift/machine-api-operator/pkg/controller/machine.(*ReconcileMachine).Reconcile(0xc0002ab6d0, {0x30c5530, 0xc00072a1b0}, {{{0xc000c90a50?, 0x0?}, {0xc0000014a0?, 0xc00087bd48?}}})
	/go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/github.com/openshift/machine-api-operator/pkg/controller/machine/controller.go:350 +0xbb8
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x30c9578?, {0x30c5530?, 0xc00072a1b0?}, {{{0xc000c90a50?, 0xb?}, {0xc0000014a0?, 0x0?}}})
	/go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119 +0xc8
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000322a00, {0x30c5488, 0xc00028e2d0}, {0x2b57480?, 0xc0000e64c0?})
	/go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316 +0x3ca
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000322a00, {0x30c5488, 0xc00028e2d0})
	/go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266 +0x1d9
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
	/go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
	/go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:223 +0x587

Expected results:

It should work

I think this is related to https://github.com/openshift/machine-api-provider-openstack/pull/87

https://github.com/openshift/machine-api-provider-openstack/pull/91

Bug OCPBUGS-24138: Update 4.15 cluster-version-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-version-operator/pull/1000

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-version-operator/pull/1000

Bug OCPBUGS-11437: MCO keeps the pull secret to .orig file once it replaced

View the Description View the linked PRs

Description of problem:

If we replace the cluster global pull secret with a empty one then MCO keeps the original secret file in `/etc/machine-config-daemon/orig/var/lib/kubelet/config.json.mcdorig` location.

Version-Release number of selected component (if applicable):

4.12.z

Steps to Reproduce:

1. create a sno cluster using cluster-bot
- launch 4.12.9 aws,single-node 

2. Replace the pull secret
```
$ cat <<EOF | oc replace -f -
apiVersion: v1
data:
  .dockerconfigjson: e30K
kind: Secret
metadata:
  name: pull-secret
  namespace: openshift-config
type: kubernetes.io/dockerconfigjson
EOF
```

3. Wait for cluster to conciliated
```
$ oc get mc
NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                          f6c21976e39cf6cb9e2ca71141478d5e612fb53f   3.2.0             56m
00-worker                                          f6c21976e39cf6cb9e2ca71141478d5e612fb53f   3.2.0             56m
01-master-container-runtime                        f6c21976e39cf6cb9e2ca71141478d5e612fb53f   3.2.0             56m
01-master-kubelet                                  f6c21976e39cf6cb9e2ca71141478d5e612fb53f   3.2.0             56m
01-worker-container-runtime                        f6c21976e39cf6cb9e2ca71141478d5e612fb53f   3.2.0             56m
01-worker-kubelet                                  f6c21976e39cf6cb9e2ca71141478d5e612fb53f   3.2.0             56m
99-master-generated-kubelet                        f6c21976e39cf6cb9e2ca71141478d5e612fb53f   3.2.0             56m
99-master-generated-registries                     f6c21976e39cf6cb9e2ca71141478d5e612fb53f   3.2.0             56m
99-master-ssh                                                                                 3.2.0             60m
99-worker-generated-registries                     f6c21976e39cf6cb9e2ca71141478d5e612fb53f   3.2.0             56m
99-worker-ssh                                                                                 3.2.0             60m
rendered-master-50d505c46c5e1dae8f1d91c81b2e0d1e   f6c21976e39cf6cb9e2ca71141478d5e612fb53f   3.2.0             56m
rendered-master-619b2780e8787c88c3acb0c68de45a9f   f6c21976e39cf6cb9e2ca71141478d5e612fb53f   3.2.0             36m
rendered-master-801d3c549c0fb3267cafc7e48968a8ac   f6c21976e39cf6cb9e2ca71141478d5e612fb53f   3.2.0             56m
rendered-worker-86690adc0446e7f7feb68f9b9690632d   f6c21976e39cf6cb9e2ca71141478d5e612fb53f   3.2.0             36m
rendered-worker-d7e635328a14333ed6ad27603fe5b5db   f6c21976e39cf6cb9e2ca71141478d5e612fb53f   3.2.0             56m
```

4. debug to the node and check the file
```
$ cat /etc/machine-config-daemon/orig/var/lib/kubelet/config.json.mcdorig
```

Actual results:

orig file have actual pull secretes which was used in initial cluster provision.

Expected results:

There shouldn't be any file with this info

Additional info:

https://github.com/openshift/machine-config-operator/pull/3759

Bug OCPBUGS-26413: [gcp] perms errors

View the Description View the linked PRs

This is a clone of issue OCPBUGS-25654. The following is the description of the original issue:
—
Description of problem:

    Permission related errors in capi  capg and cluster-capi-operator  logs

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

    Always

Steps to Reproduce:

    1.Install tech preview cluster with new PRs [https://issues.redhat.com/browse/OCPCLOUD-1718]
    2.Run regression suite of ClusterInfrastructure 
    
    Example run - https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/ginkgo-test/219040/testReport/

Actual results:

    Tests failing which are related to ccm , cpms

Expected results:

    tests pass

Additional info:

    Analysis of tests is done and Joel has also helped on new commits to mapi PRs to fix mapi related issues , but others repos are still wip.

Logs -
cluster capi operator errors :

[miyadav@miyadav ~]$ oc logs capi-controller-manager-74d65dd8f4-s5rlh --kubeconfig kk2 | grep -i denied
[miyadav@miyadav ~]$ oc logs capi-controller-manager-74d65dd8f4-s5rlh --kubeconfig kk2 | grep -i error
[miyadav@miyadav ~]$ oc logs cluster-capi-operator-66b7f99b9d-bbqxz --kubeconfig kk2 | grep -i error 
E1214 06:19:17.025379       1 kind.go:63] controller-runtime/source/EventHandler "msg"="if kind is a CRD, it should be installed before calling Start" "error"="failed to get restmapping: no matches for kind \"GCPCluster\" in group \"infrastructure.cluster.x-k8s.io\"" "kind"={"Group":"infrastructure.cluster.x-k8s.io","Kind":"GCPCluster"}
E1214 06:19:17.025874       1 kind.go:68] controller-runtime/source/EventHandler "msg"="failed to get informer from cache" "error"="failed to get restmapping: failed to find API group \"cluster.x-k8s.io\"" 
E1214 06:19:17.072299       1 kind.go:63] controller-runtime/source/EventHandler "msg"="if kind is a CRD, it should be installed before calling Start" "error"="failed to get restmapping: no matches for kind \"GCPCluster\" in group \"infrastructure.cluster.x-k8s.io\"" "kind"={"Group":"infrastructure.cluster.x-k8s.io","Kind":"GCPCluster"}
E1214 06:19:17.312724       1 kind.go:68] controller-runtime/source/EventHandler "msg"="failed to get informer from cache" "error"="failed to get restmapping: failed to find API group \"cluster.x-k8s.io\"" 
E1214 06:23:21.928322       1 leaderelection.go:327] error retrieving resource lock openshift-cluster-api/cluster-capi-operator-leader: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-cluster-api/leases/cluster-capi-operator-leader": dial tcp 172.30.0.1:443: connect: connection refused
E1214 06:23:43.558393       1 controller.go:324]  "msg"="Reconciler error" "error"="error during reconcile: failed to set conditions for CAPI Installer controller: Put \"https://172.30.0.1:443/apis/config.openshift.io/v1/clusteroperators/cluster-api/status\": dial tcp 172.30.0.1:443: connect: connection refused" "ClusterOperator"={"name":"cluster-api"} "controller"="clusteroperator" "controllerGroup"="config.openshift.io" "controllerKind"="ClusterOperator" "name"="cluster-api" "namespace"="" "reconcileID"="e36d1c19-dd22-4095-8d6b-50101f2bbefe"
E1214 06:23:47.931676       1 leaderelection.go:327] error retrieving resource lock openshift-cluster-api/cluster-capi-operator-leader: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-cluster-api/leases/cluster-capi-operator-leader": dial tcp 172.30.0.1:443: connect: connection refused
E1214 06:24:03.625555       1 controller.go:324]  "msg"="Reconciler error" "error"="error during reconcile: error applying CAPI provider \"cluster-api\" components: error applying CAPI provider component \"apiextensions.k8s.io/v1/CustomResourceDefinition - clusterclasses.cluster.x-k8s.io\" at position 0: Get \"https://172.30.0.1:443/apis/apiextensions.k8s.io/v1/customresourcedefinitions/clusterclasses.cluster.x-k8s.io\": dial tcp 172.30.0.1:443: connect: connection refused\nerror applying CAPI provider component \"apiextensions.k8s.io/v1/CustomResourceDefinition - clusters.cluster.x-k8s.io\" at position 1: Get \"https://172.30.0.1:443/apis/apiextensions.k8s.io/v1/customresourcedefinitions/clusters.cluster.x-k8s.io\": dial tcp 172.30.0.1:443: connect: connection refused\nerror applying CAPI provider component \"apiextensions.k8s.io/v1/CustomResourceDefinition - machines.cluster.x-k8s.io\" at position 2: Get \"https://172.30.0.1:443/apis/apiextensions.k8s.io/v1/customresourcedefinitions/machines.cluster.x-k8s.io\": dial tcp 172.30.0.1:443: connect: connection refused\nerror applying CAPI provider component \"apiextensions.k8s.io/v1/CustomResourceDefinition - machinesets.cluster.x-k8s.io\" at position 3: Get \"https://172.30.0.1:443/apis/apiextensions.k8s.io/v1/customresourcedefinitions/machinesets.cluster.x-k8s.io\": dial tcp 172.30.0.1:443: connect: connection refused\nerror applying CAPI provider component \"apiextensions.k8s.io/v1/CustomResourceDefinition - machinedeployments.cluster.x-k8s.io\" at position 4: Get \"https://172.30.0.1:443/apis/apiextensions.k8s.io/v1/customresourcedefinitions/machinedeployments.cluster.x-k8s.io\": dial tcp 172.30.0.1:443: connect: connection refused\nerror applying CAPI provider component \"apiextensions.k8s.io/v1/CustomResourceDefinition - machinepools.cluster.x-k8s.io\" at position 5: Get \"https://172.30.0.1:443/apis/apiextensions.k8s.io/v1/customresourcedefinitions/machinepools.cluster.x-k8s.io\": dial tcp 172.30.0.1:443: connect: connection refused\nerror applying CAPI provider component \"apiextensions.k8s.io/v1/CustomResourceDefinition - clusterresourcesets.addons.cluster.x-k8s.io\" at position 6: Get \"https://172.30.0.1:443/apis/apiextensions.k8s.io/v1/customresourcedefinitions/clusterresourcesets.addons.cluster.x-k8s.io\": dial tcp 172.30.0.1:443: connect: connection refused\nerror applying CAPI provider component \"apiextensions.k8s.io/v1/CustomResourceDefinition - clusterresourcesetbindings.addons.cluster.x-k8s.io\" at position 7: Get \"https://172.30.0.1:443/apis/apiextensions.k8s.io/v1/customresourcedefinitions/clusterresourcesetbindings.addons.cluster.x-k8s.io\": dial tcp 172.30.0.1:443: connect: connection refused\nerror applying CAPI provider component \"apiextensions.k8s.io/v1/CustomResourceDefinition - machinehealthchecks.cluster.x-k8s.io\" at position 8: Get \"https://172.30.0.1:443/apis/apiextensions.k8s.io/v1/customresourcedefinitions/machinehealthchecks.cluster.x-k8s.io\": dial tcp 172.30.0.1:443: connect: connection refused\nerror applying CAPI provider component \"apiextensions.k8s.io/v1/CustomResourceDefinition - extensionconfigs.runtime.cluster.x-k8s.io\" at position 9: Get \"https://172.30.0.1:443/apis/apiextensions.k8s.io/v1/customresourcedefinitions/extensionconfigs.runtime.cluster.x-k8s.io\": dial tcp 172.30.0.1:443: connect: connection refused\nerror applying CAPI provider component \"apiextensions.k8s.io/v1/CustomResourceDefinition - ipaddresses.ipam.cluster.x-k8s.io\" at position 10: Get \"https://172.30.0.1:443/apis/apiextensions.k8s.io/v1/customresourcedefinitions/ipaddresses.ipam.cluster.x-k8s.io\": dial tcp 172.30.0.1:443: connect: connection refused\nerror applying CAPI provider component \"apiextensions.k8s.io/v1/CustomResourceDefinition - ipaddressclaims.ipam.cluster.x-k8s.io\" at position 11: Get \"https://172.30.0.1:443/apis/apiextensions.k8s.io/v1/customresourcedefinitions/ipaddressclaims.ipam.cluster.x-k8s.io\": dial tcp 172.30.0.1:443: connect: connection refused\nerror applying CAPI provider component \"rbac.authorization.k8s.io/v1/ClusterRoleBinding - capi-manager-rolebinding\" at position 12: Get \"https://172.30.0.1:443/apis/rbac.authorization.k8s.io/v1/clusterrolebindings/capi-manager-rolebinding\": dial tcp 172.30.0.1:443: connect: connection refused\nerror applying CAPI provider component \"rbac.authorization.k8s.io/v1/ClusterRole - capi-manager-role\" at position 13: Get \"https://172.30.0.1:443/apis/rbac.authorization.k8s.io/v1/clusterroles/capi-manager-role\": dial tcp 172.30.0.1:443: connect: connection refused" "ClusterOperator"={"name":"cluster-api"} "controller"="clusteroperator" "controllerGroup"="config.openshift.io" "controllerKind"="ClusterOperator" "name"="cluster-api" "namespace"="" "reconcileID"="973b6337-9db3-4543-aa4f-e417b016e32f"
E1214 06:25:58.205862       1 leaderelection.go:327] error retrieving resource lock openshift-cluster-api/cluster-capi-operator-leader: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-cluster-api/leases/cluster-capi-operator-leader": dial tcp 172.30.0.1:443: connect: connection refused
E1214 06:29:53.798600       1 leaderelection.go:327] error retrieving resource lock openshift-cluster-api/cluster-capi-operator-leader: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-cluster-api/leases/cluster-capi-operator-leader": dial tcp 172.30.0.1:443: connect: connection refused
E1214 06:33:20.139517       1 leaderelection.go:327] error retrieving resource lock openshift-cluster-api/cluster-capi-operator-leader: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-cluster-api/leases/cluster-capi-operator-leader": dial tcp 172.30.0.1:443: connect: connection refused
E1214 06:34:16.142400       1 leaderelection.go:327] error retrieving resource lock openshift-cluster-api/cluster-capi-operator-leader: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-cluster-api/leases/cluster-capi-operator-leader": dial tcp 172.30.0.1:443: i/o timeout
E1214 06:45:15.546142       1 kubeconfig.go:81] KubeconfigController "msg"="Error reconciling kubeconfig" "error"="error generating kubeconfig: token can't be empty" "Secret"={"name":"cluster-capi-operator-secret","namespace":"openshift-cluster-api"} "controller"="secret" "controllerGroup"="" "controllerKind"="Secret" "name"="cluster-capi-operator-secret" "namespace"="openshift-cluster-api" "reconcileID"="910273fa-6f22-4326-a330-a235be2c6cc4"
E1214 06:45:15.560795       1 controller.go:324]  "msg"="Reconciler error" "error"="error generating kubeconfig: token can't be empty" "Secret"={"name":"cluster-capi-operator-secret","namespace":"openshift-cluster-api"} "controller"="secret" "controllerGroup"="" "controllerKind"="Secret" "name"="cluster-capi-operator-secret" "namespace"="openshift-cluster-api" "reconcileID"="910273fa-6f22-4326-a330-a235be2c6cc4"
E1214 06:45:15.567938       1 kubeconfig.go:81] KubeconfigController "msg"="Error reconciling kubeconfig" "error"="error generating kubeconfig: token can't be empty" "Secret"={"name":"cluster-capi-operator-secret","namespace":"openshift-cluster-api"} "controller"="secret" "controllerGroup"="" "controllerKind"="Secret" "name"="cluster-capi-operator-secret" "namespace"="openshift-cluster-api" "reconcileID"="d6e13dc5-9b90-42f3-bcbd-c451bf4359a9"

capg errors

[miyadav@miyadav ~]$ oc logs capg-controller-manager-6b54798bb9-x6vxk --kubeconfig kk2 | grep -i denied
E1214 07:26:10.892932       1 reconcile.go:152]  "msg"="Error creating an instance" "error"="googleapi: Error 400: SERVICE_ACCOUNT_ACCESS_DENIED - The user does not have access to service account 'miyadav-1412v3-28f9k-w@openshift-qe.iam.gserviceaccount.com'.  User: 'miyadav-1412-openshift-c-v5vsh@openshift-qe.iam.gserviceaccount.com'.  Ask a project owner to grant you the iam.serviceAccountUser role on the service account" "GCPMachine"={"name":"gcp-machinetemplate-6pgrk","namespace":"openshift-cluster-api"} "controller"="gcpmachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="GCPMachine" "name"="gcp-machinetemplate-6pgrk" "namespace"="openshift-cluster-api" "reconcileID"="1cca1651-62b0-4939-b1fb-f7006dbef4eb" "zone"="us-central1-b"
E1214 07:26:10.892988       1 gcpmachine_controller.go:229]  "msg"="Error reconciling instance resources" "error"="googleapi: Error 400: SERVICE_ACCOUNT_ACCESS_DENIED - The user does not have access to service account 'miyadav-1412v3-28f9k-w@openshift-qe.iam.gserviceaccount.com'.  User: 'miyadav-1412-openshift-c-v5vsh@openshift-qe.iam.gserviceaccount.com'.  Ask a project owner to grant you the iam.serviceAccountUser role on the service account" "GCPMachine"={"name":"gcp-machinetemplate-6pgrk","namespace":"openshift-cluster-api"} "controller"="gcpmachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="GCPMachine" "name"="gcp-machinetemplate-6pgrk" "namespace"="openshift-cluster-api" "reconcileID"="1cca1651-62b0-4939-b1fb-f7006dbef4eb"
E1214 07:26:10.911565       1 controller.go:324]  "msg"="Reconciler error" "error"="googleapi: Error 400: SERVICE_ACCOUNT_ACCESS_DENIED - The user does not have access to service account 'miyadav-1412v3-28f9k-w@openshift-qe.iam.gserviceaccount.com'.  User: 'miyadav-1412-openshift-c-v5vsh@openshift-qe.iam.gserviceaccount.com'.  Ask a project owner to grant you the iam.serviceAccountUser role on the service account" "GCPMachine"={"name":"gcp-machinetemplate-6pgrk","namespace":"openshift-cluster-api"} "controller"="gcpmachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="GCPMachine" "name"="gcp-machinetemplate-6pgrk" "namespace"="openshift-cluster-api" "reconcileID"="1cca1651-62b0-4939-b1fb-f7006dbef4eb"

https://github.com/openshift/cluster-capi-operator/pull/155

Bug OCPBUGS-18948: OLM CRD compatibility check logic is incorrect

View the Description View the linked PRs

Description of problem:

OLM is supposed to verify that an update to a CRD does not introduce validation that is more restrictive than what is currently in effect. The logic for this only works if a CRD uses a single spec.validation entry, but this is unlikely to ever be the case. Instead, most CRDs use per-version validation schemas.

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1. Create an operator that has a CRD with an entry in spec.versions along with spec.versions[].schema populated with some validation schema.
2. Create a CR
3. Attempt to upgrade to a newer version of the operator, where the CRD is updated to add a new version whose schema validation is more restrictive and will fail against the CR that was previously created

Actual results:

Upgrade succeeds

Expected results:

Upgrade fails

Additional info:

https://github.com/openshift/operator-framework-olm/pull/592

Bug OSASINFRA-3283: MAPO: missing port profile when converting to CAPO v1alpha7

View the linked PRs

https://github.com/openshift/machine-api-provider-openstack/pull/93

Bug OCPBUGS-19452: DaemonSet fails to scale down during the rolling update when maxUnavailable=0

View the Description View the linked PRs

Description of problem:

The OpenShift DNS daemonset has the rolling update strategy. The "maxSurge" parameter is set to a non zero value which means that the "maxUnavailable" parameter is set to zero. When the user replaces the toleration in the daemonset's template spec (via the OpenShift DNS config API) from the one which helps to be scheduled on the master node into any other toleration: the new pods are still trying to be scheduled on the master nodes. The old pods from the tolerated nodes can be lucky enough to be recreated but only if they go before any pod from the intolerable node.

The new pods are not expected to be scheduled on the nodes which are not tolerated by the new damonset's template spec. The daemonset controller should just delete the old pods from the nodes which cannot be tolerated anymore. The old pods from the nodes which can still be tolerated should be recreated according to the rolling update parameters.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:
1. Create the daemonset which tolerates "node-role.kubernetes.io/master" taint and has the following rolling update parameters:

$ oc -n openshift-dns get ds dns-default -o yaml | yq .spec.updateStrategy
rollingUpdate:
  maxSurge: 10%
  maxUnavailable: 0
type: RollingUpdate

$ oc  -n openshift-dns get ds dns-default -o yaml | yq .spec.template.spec.tolerations
- key: node-role.kubernetes.io/master
  operator: Exists

2. Let the daemonset to be scheduled on all the target nodes (e.g. all masters and all workers)

$ oc -n openshift-dns get pods  -o wide | grep dns-default
dns-default-6bfmf     2/2     Running   0          119m    10.129.0.40   ci-ln-sb5ply2-72292-qlhc8-master-2         <none>           <none>
dns-default-9cjdf     2/2     Running   0          2m35s   10.129.2.15   ci-ln-sb5ply2-72292-qlhc8-worker-c-m5wzq   <none>           <none>
dns-default-c6j9x     2/2     Running   0          119m    10.128.0.13   ci-ln-sb5ply2-72292-qlhc8-master-0         <none>           <none>
dns-default-fhqrs     2/2     Running   0          2m12s   10.131.0.29   ci-ln-sb5ply2-72292-qlhc8-worker-a-6q7hs   <none>           <none>
dns-default-lx2nf     2/2     Running   0          119m    10.130.0.15   ci-ln-sb5ply2-72292-qlhc8-master-1         <none>           <none>
dns-default-mmc78     2/2     Running   0          112m    10.128.2.7    ci-ln-sb5ply2-72292-qlhc8-worker-b-bpjdk   <none>           <none>

3. Update the daemonset's tolerations by removing "node-role.kubernetes.io/master" and adding any other toleration (not existing works too):

$ oc -n openshift-dns get ds dns-default -o yaml | yq .spec.template.spec.tolerations
- key: test-taint
  operator: Exists

Actual results:

$ oc -n openshift-dns get pods  -o wide | grep dns-default
dns-default-6bfmf     2/2     Running   0          124m    10.129.0.40   ci-ln-sb5ply2-72292-qlhc8-master-2         <none>           <none>
dns-default-76vjz     0/2     Pending   0          3m2s    <none>        <none>                                     <none>           <none>
dns-default-9cjdf     2/2     Running   0          7m24s   10.129.2.15   ci-ln-sb5ply2-72292-qlhc8-worker-c-m5wzq   <none>           <none>
dns-default-c6j9x     2/2     Running   0          124m    10.128.0.13   ci-ln-sb5ply2-72292-qlhc8-master-0         <none>           <none>
dns-default-fhqrs     2/2     Running   0          7m1s    10.131.0.29   ci-ln-sb5ply2-72292-qlhc8-worker-a-6q7hs   <none>           <none>
dns-default-lx2nf     2/2     Running   0          124m    10.130.0.15   ci-ln-sb5ply2-72292-qlhc8-master-1         <none>           <none>
dns-default-mmc78     2/2     Running   0          117m    10.128.2.7    ci-ln-sb5ply2-72292-qlhc8-worker-b-bpjdk   <none>           <none>

Expected results:

$ oc -n openshift-dns get pods  -o wide | grep dns-default
dns-default-9cjdf     2/2     Running   0          7m24s   10.129.2.15   ci-ln-sb5ply2-72292-qlhc8-worker-c-m5wzq   <none>           <none>
dns-default-fhqrs     2/2     Running   0          7m1s    10.131.0.29   ci-ln-sb5ply2-72292-qlhc8-worker-a-6q7hs   <none>           <none>
dns-default-mmc78     2/2     Running   0          7m54s   10.128.2.7    ci-ln-sb5ply2-72292-qlhc8-worker-b-bpjdk   <none>           <none>

Additional info:
Upstream issue: https://github.com/kubernetes/kubernetes/issues/118823
Slack discussion: https://redhat-internal.slack.com/archives/CKJR6200N/p1687455135950439

https://github.com/openshift/kubernetes/pull/1716

Bug OCPBUGS-20076: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2049

Bug OCPBUGS-25211: PipelineRun List page list PipelineRuns from all namespace

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13439

Bug OCPBUGS-26510: CCO reports wrong credentials mode in metrics

View the Description View the linked PRs

This is a clone of issue OCPBUGS-26488. The following is the description of the original issue:
—
Description of problem:

CCO reports credsremoved mode in metrics when the cluster is actually in the default mode. 
See https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/47349/rehearse-47349-pull-ci-openshift-cloud-credential-operator-release-4.16-e2e-aws-qe/1744240905512030208 (OCP-31768).

Version-Release number of selected component (if applicable):

4.16

How reproducible:

Always.

Steps to Reproduce:

1. Creates an AWS cluster with CCO in the default mode (ends up in mint)
2. Get the value of the cco_credentials_mode metric

Actual results:

credsremoved

Expected results:

mint

Root cause:

The controller-runtime client used in metrics calculator (https://github.com/openshift/cloud-credential-operator/blob/77a68ad01e75162bfa04097b22f80d305c192439/pkg/operator/metrics/metrics.go#L77) is unable to GET the root credentials Secret (https://github.com/openshift/cloud-credential-operator/blob/77a68ad01e75162bfa04097b22f80d305c192439/pkg/operator/metrics/metrics.go#L184) since it is backed by a cache which only contains target Secrets requested by other operators (https://github.com/openshift/cloud-credential-operator/blob/77a68ad01e75162bfa04097b22f80d305c192439/pkg/cmd/operator/cmd.go#L164-L168).

https://github.com/openshift/cloud-credential-operator/pull/646

Story TRT-1378: openshift-cloud-controller-manager-operator trying to pull 4.2.0 image for kube-rbac-proxy

View the Description View the linked PRs

Causing payload rejection now.

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/304/files caused it

From Trevor:

https://redhat-internal.slack.com/archives/CBZHF4DHC/p1701485079971669

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/27231/pull-ci-openshift-origin-master-e2e-metal-ipi-ovn-ipv6/1730718245385670656
: [sig-arch] events should not repeat pathologically for ns/openshift-cloud-controller-manager-operator expand_less 0s
{ 1 events happened too frequently

event happened 374 times, something is wrong: namespace/openshift-cloud-controller-manager-operator node/master-1.ostest.test.metalkube.org pod/cluster-cloud-controller-manager-operator-5b6b87b648-rzdbc hmsg/873af7a9ec - reason/BackOff Back-off pulling image "quay.io/openshift/origin-kube-rbac-proxy:4.2.0" From: 00:53:59Z To: 00:54:00Z result=reject }
4.2 is an old-sounding tag? Seems like not-a-flake, but still gathering data

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/305

Bug OCPBUGS-24280: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13387

Task OPRUN-3078: Downstream Sync for rukpak v0.15.0

View the Description View the linked PRs

Bring the downstream rukpak repo up-to-date with the v0.15.0 upstream release.

https://github.com/openshift/operator-framework-rukpak/pull/50

Bug OCPBUGS-19213: Update 4.15 ose-openstack-cinder-csi-driver-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/openstack-cinder-csi-driver-operator/pull/133

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/openstack-cinder-csi-driver-operator/pull/133

Bug OCPBUGS-22703: Monitor tests are failing in Local Zone jobs (edge nodes)

View the Description View the linked PRs

Description of problem:

The following pre submit jobs for Local Zones are perm failing since August:
- e2e-aws-ovn-localzones: https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-installer-master-e2e-aws-ovn-localzones?buildId=1716457254460329984
- e2e-aws-ovn-shared-vpc-localzones: https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-installer-master-e2e-aws-ovn-shared-vpc-localzones

Investigating we can see common failures in tests '[sig-network] can collect <poller_name> poller pod logs', leading the most of jobs to not completed correctly for those failures.

Exploring the code I can see it was recently added, near August and matches with when the failures started.

It is required to tolerate the label "node-role.kubernetes.io/edge" to run pods on instances located in Local Zone ("edge nodes"). I am not sure if I am looking in the correct place, but it seems it is tolerating only master labels: https://github.com/openshift/origin/blob/master/pkg/monitortests/network/disruptionpodnetwork/host-network-target-deployment.yaml#L42

Version-Release number of selected component (if applicable):

4.15.0

How reproducible:

always

Steps to Reproduce:

trigger the job:
1. open a PR on installer
2. run the job
3. check failed tests '[sig-network] can collect <poller_name> poller pod logs' 

Example of 4.15 blocked feature PR (Wavelength Zones): https://github.com/openshift/installer/pull/7369#issuecomment-1783699175

Actual results:

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_installer/7590/pull-ci-openshift-installer-master-e2e-aws-ovn-localzones/1715075142427611136
{  1 pods lacked sampler output: [pod-network-to-pod-network-disruption-poller-d94fb55db-9qfpz]}

E1018 22:06:34.773866       1 disruption_backend_sampler.go:496] not finished writing all samples (1 remaining), but we're told to close
E1018 22:06:34.774669       1 disruption_backend_sampler.go:496] not finished writing all samples (1 remaining), but we're told to close

Expected results:

Monitor jobs be scheduled in edge nodes?
How we can track job failures for new monitor tests?

Additional info:

Edge nodes have NoSchedule taints applied by default, to run monitor pods in those nodes you need to tolerate the label "node-role.kubernetes.io/edge"

See the enhancement for more informaation: https://github.com/openshift/enhancements/blob/master/enhancements/installer/aws-custom-edge-machineset-local-zones.md#user-workload-deployments

Looking the must-gather of job 1716457254460329984, you can see the monitor pods not scheduled due the missing tolerations:

$ grep -rni pod-network-to-pod-network-disruption-poller-7c97cd5d7-t2mn2 \
  1716457254460329984-must-gather/09abb0d6fc08ee340563e6e11f5ceafb42fb371e50ab6acee6764031062525b7/namespaces/openshift-kube-scheduler/pods/ \
  | awk -F'] "' '{print$2}' | sort | uniq -c
    215 Unable to schedule pod; no fit; waiting" pod="e2e-pod-network-disruption-test-59s5d/pod-network-to-pod-network-disruption-poller-7c97cd5d7-t2mn2" 
err="0/7 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/edge: }, 
6 node(s) didn't match pod anti-affinity rules. preemption: 0/7 nodes are available: 
1 Preemption is not helpful for scheduling, 6 No preemption victims found for incoming pod.."

https://github.com/openshift/origin/pull/28363

Bug OCPBUGS-23209: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4026

Bug MGMT-16151: [STG][Scale] Failed to download installation logs form cluster with 103 nodes

View the Description View the linked PRs

Cluster with 3 masters and 100 workers installed succefully,
Attempt to download installation logs failed - nothing happened
Error raised in Debugger console:

Access to XMLHttpRequest at 
'https://api.stage.openshift.com/api/assisted-install/v2/clusters/c7d60db0-2997-4380-813d-b504134e9920/downloads/files-presigned?file_name=logs&logs_type=all' 
from origin 'https://qaprodauth.console.redhat.com' has been blocked by CORS policy: 
No 'Access-Control-Allow-Origin' header is present on the requested resource.

src_bootstrap_tsx-src_moduleOverrides_unfetch_ts-webpack_sharing_consume_default_patternfly_r-31174d.fcbb79a89748b2f6.js:22320     
GET https://api.stage.openshift.com/api/assisted-install/v2/clusters/c7d60db0-2997-4380-813d-b504134e9920/downloads/files-presigned?file_name=logs&logs_type=all 
net::ERR_FAILED 504 (Gateway Timeout)

It happened on browsers:
Chrome 117.0.5938.92
Firefox 117.0.1 (64-bit)

See attached screenshots and logs from Assisted Service pod

I can successfully download installation logs from other clusters using the same browsers.

Steps to reproduce:
1. Install cluster with 103 nodes
2. Try download installation logs

Actual results:
Nothing happened and error raised

Expected results:
Should download installation logs

Bug OCPBUGS-19189: Update 4.15 ose-cluster-update-keys image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-update-keys/pull/51

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-update-keys/pull/51

Bug OCPBUGS-22459: Konnectivity container in apiserver pod should delay shutdown

View the Description View the linked PRs

Description of problem:

In HyperShift 4.14, the konnectivity server is run inside the kube-apiserver pod. When this pod is deleted for any reason, the konnectivity server container can drop before the rest of the pod terminates, which can cause network connections to drop. The following preStop definition can be added to the container to ensure it stays alive long enough for the rest of the pod to clean up.

lifecycle:
  preStop:
    exec:
      command:
        - /bin/sh
        - -c
        - sleep 70

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3250

Bug OCPBUGS-26993: [Driver: pd.csi.storage.gke.io] [Testpattern: Dynamic PV (block volmode)] provisioning should provision storage with pvc data source in parallel [Slow] failing

View the Description View the linked PRs

This is a clone of issue OCPBUGS-26486. The following is the description of the original issue:
—
Description of problem:

The following test started to fail freequently in the periodic tests:

External Storage [Driver: pd.csi.storage.gke.io] [Testpattern: Dynamic PV
 (block volmode)] provisioning should provision storage with pvc data 
source in parallel

Version-Release number of selected component (if applicable):

    4.15

How reproducible:

    Sometimes, but way too often in the CI

Steps to Reproduce:

    1. Run the periodic-ci-openshift-release-master-nightly-X.X-e2e-gcp-ovn-csi test

Actual results:

    Provisioning of some volumes fails with

time="2024-01-05T02:30:07Z" level=info msg="resulting interval message" message="{ProvisioningFailed  failed to provision volume with StorageClass \"e2e-provisioning-9385-e2e-scw2z8q\": rpc error: code = Internal desc = CreateVolume failed to create single zonal disk pvc-35b558d6-60f0-40b1-9cb7-c6bdfa9f28e7: failed to insert zonal disk: unknown Insert disk operation error: rpc error: code = Internal desc = operation operation-1704421794626-60e299f9dba08-89033abf-3046917a failed (RESOURCE_OPERATION_RATE_EXCEEDED): Operation rate exceeded for resource 'projects/XXXXXXXXXXXXXXXXXXXXXXXX/zones/us-central1-a/disks/pvc-501347a5-7d6f-4a32-b0e0-cf7a896f316d'. Too frequent operations from the source resource. map[reason:ProvisioningFailed]}"

Expected results:

    Test passes

Additional info:

    Looks like we're hitting the API quota limits with the test

Failed test run example:

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-gcp-ovn-csi/1743082616304701440

Link to Sippy:

https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&arch=amd64&baseEndTime=2023-10-31%2023%3A59%3A59&baseRelease=4.14&baseStartTime=2023-10-04%2000%3A00%3A00&capability=Dynamic%20PV%20%28block%20volmode%29&component=Storage%20%2F%20Kubernetes%20External%20Components&confidence=95&environment=ovn%20no-upgrade%20amd64%20gcp%20standard&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&network=ovn&pity=5&platform=gcp&platform=gcp&sampleEndTime=2024-01-08%2023%3A59%3A59&sampleRelease=4.15&sampleStartTime=2024-01-02%2000%3A00%3A00&testId=openshift-tests%3A7845229f6a2c8faee6573878f566d2f3&testName=External%20Storage%20%5BDriver%3A%20pd.csi.storage.gke.io%5D%20%5BTestpattern%3A%20Dynamic%20PV%20%28block%20volmode%29%5D%20provisioning%20should%20provision%20storage%20with%20pvc%20data%20source%20in%20parallel%20%5BSlow%5D&upgrade=no-upgrade&upgrade=no-upgrade&variant=standard&variant=standard

https://github.com/openshift/gcp-pd-csi-driver-operator/pull/114

Bug OCPBUGS-14718: Garbage in cloud-controller-manager status

View the Description View the linked PRs

Description of problem:

The cloud-controller-manager operator can show garbage in its status:

# oc get co cloud-controller-manager
NAME                       VERSION                                    AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
cloud-controller-manager   4.14.0-0.nightly-arm64-2023-06-07-071657   True        False         True       58m     Failed to resync for operator: 4.14.0-0.nightly-arm64-2023-06-07-071657 because &{%!e(string=failed to apply resources because TrustedCABundleControllerControllerDegraded condition is set to True)}

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-arm64-2023-06-07-071657

How reproducible:

always

Steps to Reproduce:

1. oc delete project openshift-cloud-controller-manager
2. wait a couple of minutes
3. oc get co openshift-cloud-controller-manager

Actual results:

Failed to resync for operator: 4.14.0-0.nightly-arm64-2023-06-07-071657 because &{%!e(string=failed to apply resources because TrustedCABundleControllerControllerDegraded condition is set to True)}

Expected results:

A helpful error message

Additional info:

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/275

Bug OCPBUGS-19098: Update 4.15 baremetal-runtimecfg image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/baremetal-runtimecfg/pull/274

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/baremetal-runtimecfg/pull/274

Bug OCPBUGS-23723: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oc/pull/1604

Bug OCPBUGS-6513: admission web hook probe error when deploy sample KSVC based app and then modifying icon

View the Description View the linked PRs

Description of problem:

Using the web console on the RH Developer Sandbox, created the most basic Knative Service (KSVC) using the default suggested, ie image openshift/hello-openshift.

Then tried to change the displayed icon using the web UI and an error about Probes was displayed. See attached images.

The error has no relevance to the item changed.

Version-Release number of selected component (if applicable):

whatever the RH sandbox uses, this value is not displayed to users

How reproducible:

very

Steps to Reproduce:

Using the web console on the RH Developer Sandbox, created the most basic Knative Service (KSVC) using the default image openshift/hello-openshift.

Then used the webUi to edit the KSVC sample to change the icon used from an OpenShift logo to a 3Scale logo for instance.

When saving from this form an error was reported: admission webhook 'validation webhook.serving.knative.dev' denied the request: validation failed: must not set the field(s): spec.template.spec.containers[0].readiness.Probe

Actual results:

Expected results:

Either a failure message related to changing the icon, or the icon change to take effect

Additional info:

KSVC details as provided by the web console.

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: sample
  namespace: agroom-dev
spec:
  template:
    spec:
      containers:
        - image: openshift/hello-openshift

https://github.com/openshift/console/pull/12832

Task MON-3302: Request for RHACS telemetry metrics

View the Description View the linked PRs

Request for sending data via telemetry

The goal is to collect metrics about RHACS installations to capture billing and and overall usage metrics for the product. We would also like to request a backport of the telemeter config to existing OpenShift cluster versions such that telemetry metrics become available sooner as they provide critical information to our product management.

rhacs:telemetry:rox_central_secured_clusters

rhacs:telemetry:rox_central_secured_nodes

rhacs:telemetry:rox_central_secured_vcpu

Central is the main backend component of RHACS ("hub"). The metrics shows installation info about Central, as well as usage data via three gauges (secured clusters, secured nodes, secured vCPU). This is a recording rule where unnecessary labels like instance and job have already been removed.

Labels

branding: StackRox, RHACS
build: internal, release
central_id: uuid that identifies the Central instance
central_version: RHACS product version (e.g. 4.2.0)
hosting: cloud-service, self-managed
install_method: manifest, helm, rhacs-operator

rhacs:telemetry:rox_sensor_nodes

rhacs:telemetry:rox_sensor_vcpu

Sensor is a component installed on clusters managed by RHACS. The metrics shows installation info about Sensor, as well as usage data via two gauges (secured nodes, secured vCPU). The cardinality of the metric series is 1. This is a recording rule where unnecessary labels like instance and job have already been removed.

branding: StackRox, RHACS
build: internal, release
central_id: uuid that identifies the Central instance
hosting: cloud-service, self-managed
install_method: manifest, helm, rhacs-operator
sensor_id: uuid that identifies the Sensor instance
sensor_version: RHACS product version (e.g. 4.2.0)

The cardinality of the metrics per cluster is 1.

https://github.com/openshift/cluster-monitoring-operator/pull/2062

Bug OCPBUGS-18088: Remove 90s readiness probe initial delay for OVN-IC

View the Description View the linked PRs

OVN-IC doesn't use RAFT and doesn't need to wait a while for the cluster to converge. So we don't need the 90s delay for the readiness probe on the NB and SB containers anymore.

I think we only want to do this for multi-zone-interconnect though since the other deployment types would still use some RAFT.

Bug OCPBUGS-24107: Update 4.15 ose-service-ca-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/service-ca-operator/pull/226

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/service-ca-operator/pull/226

Bug OCPBUGS-22077: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/sdn/pull/585

Bug OCPBUGS-24680: [release-4.15] [UI] Console fields and warning to support Azure Workload Identity are not showing up

View the Description View the linked PRs

Description of problem:

    CCO supports creating a credentials request in manual mode to specify the fields required to perform short term authentication using workload identity federation but the console fields and warnings that are supposed to be present are not.

Version-Release number of selected component (if applicable):

How reproducible:

    create a catalog containing a bundle that has the annotation to support WIF and apply it to an oidc manual azure cluster.

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

    No warnings or additional field options for subscription are present

Expected results:

    Warnings and additional fields for subscription should be present

Additional info:

https://github.com/openshift/console/pull/13428

Story METAL-726: update libraries versions in ironic containers for OCP 4.15

View the linked PRs

https://github.com/openshift/ironic-image/pull/412

Bug OCPBUGS-18788: kube-apiserver bound to port 60000 prevented metal3-baremetal-operator from starting

View the Description View the linked PRs

Description of problem:

metal3-baremetal-operator-7ccb58f44b-xlnnd pod failed to start on the SNO baremetal dualstack cluster:

Events:
  Type     Reason                  Age                    From               Message
  ----     ------                  ----                   ----               -------
  Normal   Scheduled               34m                    default-scheduler  Successfully assigned openshift-machine-api/metal3-baremetal-operator-7ccb58f44b-xlnnd to sno.ecoresno.lab.eng.tlv2.redha
t.com
  Warning  FailedScheduling        34m                    default-scheduler  0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports. preemption: 0/1 nodes are availabl
e: 1 node(s) didn't have free ports for the requested pod ports..
  Warning  FailedCreatePodSandBox  34m                    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to add hostport mapping for sandbox k8s_metal3-baremetal-operator-7ccb58f44b-xlnnd_openshift-machine-api_5f6d8c69-a508-47f3-a6b1-7701b9d3617e_0(c4a8b353e3ec105d2bff2eb1670b82a0f226ac1088b739a256deb9dfae6ebe54): cannot open hostport 60000 for pod k8s
_metal3-baremetal-operator-7ccb58f44b-xlnnd_openshift-machine-api_5f6d8c69-a508-47f3-a6b1-7701b9d3617e_0_: listen tcp4 :60000: bind: address already in use
  Warning  FailedCreatePodSandBox  34m                    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to add hostport mapping for sandbox k8s_metal3-bare
metal-operator-7ccb58f44b-xlnnd_openshift-machine-api_5f6d8c69-a508-47f3-a6b1-7701b9d3617e_0(9e6960899533109b02fbb569c53d7deffd1ac8185cef3d8677254f9ccf9387ff): cannot open hostport 60000 for pod k8s
_metal3-baremetal-operator-7ccb58f44b-xlnnd_openshift-machine-api_5f6d8c69-a508-47f3-a6b1-7701b9d3617e_0_: listen tcp4 :60000: bind: address already in use

Version-Release number of selected component (if applicable):

4.14.0-rc.0

How reproducible:

so far once

Steps to Reproduce:

1. Deploy disconnected baremetal SNO node with dualstack networking with agent-based installer
2.
3.

Actual results:

metal3-baremetal-operator pod fails to start

Expected results:

metal3-baremetal-operator pod is running

Additional info:

Checking the pots on node showed it was `kube-apiserver` process bound to the port:

tcp   ESTAB      0      0                                                [::1]:60000                        [::1]:2379    users:(("kube-apiserver",pid=43687,fd=455))


After rebooting the node all pods started as expected

https://github.com/openshift/cluster-baremetal-operator/pull/361

Bug OCPBUGS-18761: revert "force cert rotation every couple days for development" in 4.15

View the Description View the linked PRs

Description of problem:

revert "force cert rotation every couple days for development" in 4.15

Below is the steps to verify this bug:

# oc adm release info --commits registry.ci.openshift.org/ocp/release:4.11.0-0.nightly-2022-06-25-081133|grep -i cluster-kube-apiserver-operator
  cluster-kube-apiserver-operator                https://github.com/openshift/cluster-kube-apiserver-operator                7764681777edfa3126981a0a1d390a6060a840a3

# git log --date local --pretty="%h %an %cd - %s" 776468 |grep -i "#1307"
08973b820 openshift-ci[bot] Thu Jun 23 22:40:08 2022 - Merge pull request #1307 from tkashem/revert-cert-rotation

# oc get clusterversions.config.openshift.io 
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-06-25-081133   True        False         64m     Cluster version is 4.11.0-0.nightly-2022-06-25-081133

$ cat scripts/check_secret_expiry.sh
FILE="$1"
if [ ! -f "$1" ]; then
  echo "must provide \$1" && exit 0
fi
export IFS=$'\n'
for i in `cat "$FILE"`
do
  if `echo "$i" | grep "^#" > /dev/null`; then
    continue
  fi
  NS=`echo $i | cut -d ' ' -f 1`
  SECRET=`echo $i | cut -d ' ' -f 2`
  rm -f tls.crt; oc extract secret/$SECRET -n $NS --confirm > /dev/null
  echo "Check cert dates of $SECRET in project $NS:"
  openssl x509 -noout --dates -in tls.crt; echo
done

$ cat certs.txt
openshift-kube-controller-manager-operator csr-signer-signer
openshift-kube-controller-manager-operator csr-signer
openshift-kube-controller-manager kube-controller-manager-client-cert-key
openshift-kube-apiserver-operator aggregator-client-signer
openshift-kube-apiserver aggregator-client
openshift-kube-apiserver external-loadbalancer-serving-certkey
openshift-kube-apiserver internal-loadbalancer-serving-certkey
openshift-kube-apiserver service-network-serving-certkey
openshift-config-managed kube-controller-manager-client-cert-key
openshift-config-managed kube-scheduler-client-cert-key
openshift-kube-scheduler kube-scheduler-client-cert-key

Checking the Certs,  they are with one day expiry times, this is as expected.
# ./check_secret_expiry.sh certs.txt
Check cert dates of csr-signer-signer in project openshift-kube-controller-manager-operator:
notBefore=Jun 27 04:41:38 2022 GMT
notAfter=Jun 28 04:41:38 2022 GMT

Check cert dates of csr-signer in project openshift-kube-controller-manager-operator:
notBefore=Jun 27 04:52:21 2022 GMT
notAfter=Jun 28 04:41:38 2022 GMT

Check cert dates of kube-controller-manager-client-cert-key in project openshift-kube-controller-manager:
notBefore=Jun 27 04:52:26 2022 GMT
notAfter=Jul 27 04:52:27 2022 GMT

Check cert dates of aggregator-client-signer in project openshift-kube-apiserver-operator:
notBefore=Jun 27 04:41:37 2022 GMT
notAfter=Jun 28 04:41:37 2022 GMT

Check cert dates of aggregator-client in project openshift-kube-apiserver:
notBefore=Jun 27 04:52:26 2022 GMT
notAfter=Jun 28 04:41:37 2022 GMT

Check cert dates of external-loadbalancer-serving-certkey in project openshift-kube-apiserver:
notBefore=Jun 27 04:52:26 2022 GMT
notAfter=Jul 27 04:52:27 2022 GMT

Check cert dates of internal-loadbalancer-serving-certkey in project openshift-kube-apiserver:
notBefore=Jun 27 04:52:49 2022 GMT
notAfter=Jul 27 04:52:50 2022 GMT

Check cert dates of service-network-serving-certkey in project openshift-kube-apiserver:
notBefore=Jun 27 04:52:28 2022 GMT
notAfter=Jul 27 04:52:29 2022 GMT

Check cert dates of kube-controller-manager-client-cert-key in project openshift-config-managed:
notBefore=Jun 27 04:52:26 2022 GMT
notAfter=Jul 27 04:52:27 2022 GMT

Check cert dates of kube-scheduler-client-cert-key in project openshift-config-managed:
notBefore=Jun 27 04:52:47 2022 GMT
notAfter=Jul 27 04:52:48 2022 GMT

Check cert dates of kube-scheduler-client-cert-key in project openshift-kube-scheduler:
notBefore=Jun 27 04:52:47 2022 GMT
notAfter=Jul 27 04:52:48 2022 GMT
# 

# cat check_secret_expiry_within.sh
#!/usr/bin/env bash
# usage: ./check_secret_expiry_within.sh 1day # or 15min, 2days, 2day, 2month, 1year
WITHIN=${1:-24hours}
echo "Checking validity within $WITHIN ..."
oc get secret --insecure-skip-tls-verify -A -o json | jq -r '.items[] | select(.metadata.annotations."auth.openshift.io/certificate-not-after" | . != null and fromdateiso8601<='$( date --date="+$WITHIN" +%s )') | "\(.metadata.annotations."auth.openshift.io/certificate-not-before")  \(.metadata.annotations."auth.openshift.io/certificate-not-after")  \(.metadata.namespace)\t\(.metadata.name)"'

# ./check_secret_expiry_within.sh 1day
Checking validity within 1day ...
2022-06-27T04:41:37Z  2022-06-28T04:41:37Z  openshift-kube-apiserver-operator	aggregator-client-signer
2022-06-27T04:52:26Z  2022-06-28T04:41:37Z  openshift-kube-apiserver	aggregator-client
2022-06-27T04:52:21Z  2022-06-28T04:41:38Z  openshift-kube-controller-manager-operator	csr-signer
2022-06-27T04:41:38Z  2022-06-28T04:41:38Z  openshift-kube-controller-manager-operator	csr-signer-signer

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1598

Bug OCPBUGS-19256: Update 4.15 kube-state-metrics image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kube-state-metrics/pull/97

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kube-state-metrics/pull/97

Bug OCPBUGS-20403: UPI playbook is missing sg rules for compact cluster

View the Description View the linked PRs

Description of problem:

Master only installations with workers set to replicas 0 should be supported in UPI. At the moment, the ingress rules that are enabled on workers are not enabled on master as well.

Context: https://bugzilla.redhat.com/show_bug.cgi?id=1955544

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7576

Bug OCPBUGS-24161: Update 4.15 ose-cluster-image-registry-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-image-registry-operator/pull/966

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-image-registry-operator/pull/966

Story TRT-1339: Ignore openshift-dns TopologyAwareHintsDisabled when nodes tainted in serial jobs

View the Description View the linked PRs

Payload https://amd64.ocp.releases.ci.openshift.org/releasestream/4.15.0-0.nightly/release/4.15.0-0.nightly-2023-10-26-222533 failed on no successful runs of techpreview-serial. Looks like all failed on:

[sig-arch] events should not repeat pathologically for ns/openshift-dns

{  2 events happened too frequently

event happened 22 times, something is wrong: ns/openshift-dns service/dns-default hmsg/6f6ed749fd - pathological/true reason/TopologyAwareHintsDisabled Unable to allocate minimum required endpoints to each zone without exceeding overload threshold (4 endpoints, 2 zones), addressType: IPv4 From: 23:11:05Z To: 23:11:06Z result=reject 
event happened 23 times, something is wrong: ns/openshift-dns service/dns-default hmsg/6f6ed749fd - pathological/true reason/TopologyAwareHintsDisabled Unable to allocate minimum required endpoints to each zone without exceeding overload threshold (4 endpoints, 2 zones), addressType: IPv4 From: 23:11:06Z To: 23:11:07Z result=reject }

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-aws-sdn-techpreview-serial/1717669829478977536

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-gcp-sdn-techpreview-serial/1717669755634061312

https://github.com/openshift/origin/pull/28381

Bug OCPBUGS-19224: Update 4.15 ose-csi-external-resizer image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-resizer/pull/144

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-resizer/pull/144

Bug OCPBUGS-25309: [4.15] don't find "scrape.timestamp-tolerance" setting in prometheus

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-monitoring-operator/pull/2197

Bug OCPBUGS-12772: Cinder CSI metadata requests can be affected by proxy configuration

View the Description View the linked PRs

Description of problem:

Reported upstream in https://github.com/kubernetes/cloud-provider-openstack/issues/2217

Not specifically reproduced in OpenShift, but I have no reason to think we would not be affected, and I know we have users with strict proxy requirements.

The user's configuration requires all OpenStack API requests from the tenant network to go through a proxy. They have configured a proxy 'globally' in their cluster in a manner which also affects the CSI driver.

Attempting to attach a volume to a pod fails. Inspecting the logs we see that cinder attempted to attach the volume to the proxy server, not the node hosting the pod. The reason for this is that the metadata request was also proxied, meaning the returned values relate to the proxy server, not the local server.

Version-Release number of selected component (if applicable):

4.13, but likely all versions since we enabled CSI

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cloud-provider-openstack/pull/192

Bug OCPBUGS-21773: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/1958

Bug OCPBUGS-24062: network-node-identity does not honor restart annotation

View the linked PRs

https://github.com/openshift/hypershift/pull/3245

Bug OCPBUGS-26927: Frequent SAST false positives

View the Description View the linked PRs

This is a clone of issue OCPBUGS-26765. The following is the description of the original issue:
—
Description of problem:

The SAST scans keep coming up with bogus positive results from test and vendor files. This bug is just a placeholder to allow us to backport the change to ignore those files.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/baremetal-runtimecfg/pull/293

Bug OCPBUGS-19229: Update 4.15 ose-gcp-pd-csi-driver-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/gcp-pd-csi-driver-operator/pull/85

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/gcp-pd-csi-driver-operator/pull/85

Bug OCPBUGS-24133: Update 4.15 coredns-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/coredns/pull/106

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/coredns/pull/106

Bug OCPBUGS-24385: Align status and assignee between jira and github in predispatch script

View the Description View the linked PRs

In the python script used during bug pre-dispatch, we should align status and assignee between jira and github, keeping github as the source of truth:

For a given github issue, the jira story that tracks it must show the same status (open / closed) and the same assignee; if not, script will align it.
If ever more than one jira story tracks the same github issue, then the jira story with the lowest ID is kept and all other ones are closed.

https://github.com/openshift/network-tools/pull/100

Bug OCPBUGS-18071: Ignore headless services in ovnkube-node when restarting and syncing services

View the Description View the linked PRs

Description of problem:

ovnkube-node fails to start on a customer cluster (see OHSS-26032), the error message doesn't state which step of the startup process (or which Service or other object defined on the cluster) stops.

Version-Release number of selected component (if applicable):

How reproducible:

Unknown. After a Force Rebuild of the OVN databases the ovnkube-node doesn't start.
The issue seems to be with a headless service with internalTrafficPolicy:Local which isn't allowed according to https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/2086-service-internal-traffic-policy/README.md#proposal

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/ovn-kubernetes/pull/1923

Bug OCPBUGS-22742: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/2138

Bug OCPBUGS-24932: Update 4.16 ose-cluster-update-keys-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-update-keys/pull/53

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-update-keys/pull/53

Bug OU-312: the monitoring-plugin Dockerfile used by the CI is misconfigured

View the Description View the linked PRs

This PR fixed a bug related to the nginx default assets directory in the Dockerfile . This was not backported to 4.15, which causes OCP consoles launched with ci images, like cluster bot to fail to display the observe menu. Backporting fixes the issue for 4.15 ci images.

https://github.com/openshift/monitoring-plugin/pull/93

Bug OCPBUGS-19111: Update 4.15 ose-alibaba-disk-csi-driver-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/alibaba-disk-csi-driver-operator/pull/61

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/alibaba-disk-csi-driver-operator/pull/61

Bug OCPBUGS-20164: builds.config.openshift.io CRD is available in a cluster with baselineCapabilitySet None

View the Description View the linked PRs

Description of problem:

a cluster installed with baselineCapabilitySet: None have build available while the build capability is disabled


❯ oc get -o json clusterversion version | jq '.spec.capabilities'                      
{
  "baselineCapabilitySet": "None"
}

❯ oc get -o json clusterversion version | jq '.status.capabilities.enabledCapabilities'
null

❯ oc get build -A                   
NAME      AGE
cluster   5h23m

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-04-143709

How reproducible:

100%

Steps to Reproduce:

1.install a cluster with baselineCapabilitySet: None

Actual results:

❯ oc get build -A                   
NAME      AGE
cluster   5h23m

Expected results:

❯ oc get -A build
error: the server doesn't have a resource type "build"

slack thread with more info: https://redhat-internal.slack.com/archives/CF8SMALS1/p1696527133380269

Bug OCPBUGS-22767: pod IP routing broken if KubeVirt VM migration fails

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

Is it an

internal CI failure
customer issue / SD
internal RedHat testing failure

If it is an internal RedHat testing failure:

Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

If it is a CI failure:

Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
If it's a connectivity issue,
What is the srcNode, srcIP and srcNamespace and srcPodName?
What is the dstNode, dstIP and dstNamespace and dstPodName?
What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

If it is a customer / SD issue:

Provide enough information in the bug description that Engineering doesn’t need to read the entire case history.
Don’t presume that Engineering has access to Salesforce.
Please provide must-gather and sos-report with an exact link to the comment in the support case with the attachment. The format should be: https://access.redhat.com/support/cases/#/case/<case number>/discussion?attachmentId=<attachment id>
Describe what each attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
- If the issue is in a customer namespace then provide a namespace inspect.
- If it is a connectivity issue:
  - What is the srcNode, srcNamespace, srcPodName and srcPodIP?
  - What is the dstNode, dstNamespace, dstPodName and dstPodIP?
  - What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
  - Please provide the UTC timestamp networking outage window from must-gather
  - Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
- If it is not a connectivity issue:
  - Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.

For OCPBUGS in which the issue has been identified, label with “sbr-triaged”
For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, labels with “sbr-untriaged”
Note: bugs that do not meet these minimum standards will be closed with label “SDN-Jira-template”

https://github.com/openshift/ovn-kubernetes/pull/1952

Bug OCPBUGS-18386: Cluster Version Operator does not correctly reconcile SCC resources

View the Description View the linked PRs

How reproducible:

Always

Steps to Reproduce:

1. the Kubernetes API introduces a new Pod Template parameter (`ephemeral`)
2. this parameter is not in the allowed list of the default SCC
3. customer is not allowed to edit the default SCCs nor we have a  mechanism in  place to update the built in SCCs AFAIK
4. users of existing clusters cannot use the new parameter without creating manual SCCs and assigning this SCC to service accounts themselves which looks clunky. This is documented in https://access.redhat.com/articles/6967808

Actual results:

Users of existing clusters cannot use ephemeral volumes after an upgrade

Expected results:

Users of existing clusters *can* use ephemeral volumes after an upgrade

Current status

https://github.com/openshift/cluster-version-operator/pull/966

Bug OCPBUGS-19134: Update 4.15 ose-gcp-pd-csi-driver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/gcp-pd-csi-driver/pull/42

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/gcp-pd-csi-driver/pull/42

Bug OCPBUGS-22747: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13292

Bug OCPBUGS-22930: Remove collapsible toggle for conditional update risk details

View the Description View the linked PRs

Description of problem:

When a user selects a supported-but-not-recommended update target, it's currently rendered as a DropdownWithSwitch that is collapsed by default. That forces the user to perform an extra click to see the message explaining the risk they are considering accepting. We should remove the toggle and always expand that message, because understanding the risk is a critical part of deciding whether you accept it.

Version-Release number of selected component (if applicable):

Since console landed support for conditional update risks. Not a big enough deal to backport that whole way.

How reproducible:

Every time.

Steps to Reproduce:

~~OTA-520~~ explains how to create dummy data for testing the conditional update UX pre-merge and/or on nightly builds that are not part of the usual channels yet.

Actual results:

Expected results:

but without the down-v, because the text should not be collapsible.

https://github.com/openshift/console/pull/13306

Story CONSOLE-3084: [OCM] on-cluster console should disable update buttons in managed clusters

View the Description View the linked PRs

Managed OpenShift (OSD, ROSA) on-cluster console should have their update buttons greyed-out (disabled) so that customers don't suffer the error related to webhooks blocking updates. (since OSD and ROSA need the OCM UI or ROSA CLI in order to do updates)

As managed services governs when we allow specific update versions, this change would support that without letting the user encounter an unnecessary error.

https://github.com/openshift/console/pull/13184

Bug OCPBUGS-9422: Telemetry: Current page was sometimes not tracked when reloading the current page

View the Description View the linked PRs

Description of problem:
We want to understand our users, but the first page the user opens wasn't tracked.

Version-Release number of selected component (if applicable):
Saw this on Dev Sandbox with 4.10 and 4.11 with enabled telemetry

How reproducible:
Sometimes! Looks like a race condition and requires active telemetry

Steps to Reproduce:
1. Open the browser network inspector and filter for segment
2. Open the developer console

Actual results:
1-2 identity event is send, but no page event

Expected results:
At least one identity event and at least one page event should be send to segment

Additional info:

https://github.com/openshift/console/pull/13088

Bug OCPBUGS-24212: Add ownership notifications to TLS artificates

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-monitoring-operator/pull/2158

Bug OCPBUGS-22069: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-openstack/pull/89

Bug OCPBUGS-24117: Update 4.15 ose-baremetal-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/baremetal-operator/pull/323

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/baremetal-operator/pull/323

Bug OCPBUGS-21730: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-shared-resource/pull/145

Bug OCPBUGS-22664: [IPI] coredns-monitor continuously reporting "Failed to read ip from file /run/nodeip-configuration/ipv4"

View the Description View the linked PRs

Description of problem:

After installing an OpenShift IPI vSPhere cluter the coredns-monitor containers in the "openshift-vsphere-infra" namespace continuously report the message: "Failed to read ip from file /run/nodeip-configuration/ipv4" error="open /run/nodeip-configuration/ipv4: no such file or directory". The file "/run/nodeip-configuration/ipv4" present on the nodes is not actually moutned on the coredns pods. Apparently doesn't look to have any impact on the functionality of the cluster, but having a "failed" message on the container can triggers allarm or reserach for problem in the cluster.

Version-Release number of selected component (if applicable):

Any 4.12, 4.13, 4.14

How reproducible:

Always

Steps to Reproduce:

1. Install OpenShift IPI vSphere cluster
2. Wait forthe installation to complete
3. Read the logs of any coredns-monitor container in the "openshift-vsphere-infra" namespace

Actual results:

coredns-monitor continuously report the failed message, mesleading a cluster administartor for searching if there is a real issue.

Expected results:

coredns-monitor should not report this failed message if is not needed to fix it.

Additional info:

The same issue happens in Baremetal IPI clusters.

https://github.com/openshift/machine-config-operator/pull/4058

Bug HOSTEDCP-1281: Creating of HostedCluster fails on webhook

View the Description View the linked PRs

When creating a HostedCluster from the cli, with KubeVirt platform and external infra-cluster, the creation is failed with this message:

hypershift_framework.go:223: failed to create cluster, tearing down: failed to apply object "e2e-clusters-jqrxx/example-kk2sm": admission webhook "hostedclusters.hypershift.openshift.io" denied the request: Secret "example-kk2sm-infra-credentials" not found

The reason for that is the HosterCluster CR is created before the kubeconfig secret of the external infra-cluster is created. The HostedCluster creation webhook is trying to access the external infra-cluster, fails to find the secret that is not created yet.

https://github.com/openshift/hypershift/pull/3164

Bug OCPBUGS-17003: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-baremetal-operator/pull/365

Bug OCPBUGS-24073: Update 4.15 prometheus-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prometheus-operator/pull/258

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/prometheus-operator/pull/258

Bug OCPBUGS-20229: Service details page shows revisions and routes from other service also

View the Description View the linked PRs

Description of problem:

In the newly added tabs for Revisions and Routes in service details page, the details of other service is also displayed. It should filter for the particular service

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always

Steps to Reproduce:

1. install serverless operator
2. Create serving instance
3. create multiple service in a namespace
4. Click on any service and go to Revisions, Routes and Pods page

Actual results:

Revisions and routes from other service also displayed

Expected results:

Revisions and routes for that particular service should be displayed

Additional info:

https://github.com/openshift/console/pull/13221

Bug OCPBUGS-23397: Sync openshift-apiserver's shutdown-delay-duration with core offering

View the Description View the linked PRs

Description of problem:

The shutdown-delay-duration argument for the openshift-apiserver is set to 3s in hypershift, but set to 15s in core openshift. Hypershift should update the value to match.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Diff the openshift-apiserver configs

Actual results:

https://github.com/openshift/hypershift/blob/3a42e77041535c8ac8012856d279bc782efcaf3c/control-plane-operator/controllers/hostedcontrolplane/oapi/config.go#L59C1-L60C1

Expected results:

https://github.com/openshift/cluster-openshift-apiserver-operator/commit/cad9746b62abf3b3230592d45f7f60bcecc96dac

Additional info:

https://github.com/openshift/hypershift/pull/3204

Bug OCPBUGS-23927: idp table line is missing

View the Description View the linked PRs

Description of problem:

Check on oauth page(/k8s/cluster/config.openshift.io~v1~OAuth/cluster), there is not table line for idp list now

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-11-22-204142

How reproducible:

Always

Steps to Reproduce:

    1.Check on oauth page(/k8s/cluster/config.openshift.io~v1~OAuth/cluster)
    2.
    3.

Actual results:

1. Miss table line for idp list

Expected results:

1. Should show idp tables

Additional info:

screenshot: https://drive.google.com/file/d/1xmF5_RYZtAfcfY57kWi9ttcahKFFd_Kc/view?usp=sharing

https://github.com/openshift/console/pull/13372

Bug OCPBUGS-21630: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/thanos/pull/123

Bug OCPBUGS-21940: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-manila-operator/pull/206

Bug OCPBUGS-25241: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-storage-operator/pull/433

Task MGMT-16045: fix http2 CVE-2023-44487

View the Description View the linked PRs

OCPBUGS-20385

https://github.com/openshift/assisted-service/pull/5614

Bug OCPBUGS-23794: Shipwright builds decorator is not visible in topology view in the local setup

View the Description View the linked PRs

Description of problem:

When creating deployments/deployment-config and associated shipwright builds, different decorators associated with node in topology is not visible

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

    1. Install pipeline and shipwright operator
    2. Create deployment with build runs
    3. Run the cluster in the local setup
    4. Go to topology where deployments are created

Actual results:

No decorator visible

Expected results:

Decorators should be visible

Additional info:

https://github.com/openshift/console/pull/13376

Bug OCPBUGS-24027: forbidden access to resource on shared-resource-csi-driver-operator

View the Description View the linked PRs

e2e-aws-serial-techpreview lane under openshift/api is falling:
shared-resource-csi-driver-operator fails with:

failed to list *v1.APIServer: apiservers.config.openshift.io is forbidden: User "system:serviceaccount:openshift-cluster-csi-drivers:shared-resource-csi-driver-operator" cannot list resource "apiservers" in API group "config.openshift.io" at the cluster scope

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_api/[…]f-rzg8q_shared-resource-csi-driver-operator.log

https://github.com/openshift/cluster-storage-operator/pull/425

Bug OCPBUGS-7465: oc-mirror will hit 401 code after hang a while

View the Description View the linked PRs

Description of problem:

When use the command `oc-mirror list operators --catalog=registry.redhat.io/redhat/certified-operator-index:v4.12 -v 9` , at begging the response code is 200 okay , when the command will hang for a while , then will got response code 401.

Version-Release number of selected component (if applicable):

How reproducible:

sometimes

Steps to Reproduce:

Using the advanced cluster management package as an example.

1. oc-mirror list operators --catalog=registry.redhat.io/redhat/certified-operator-index:v4.12 -v 9

Actual results: After hang a while , will got 401 code , seems when timeout the oc-mirror try again forgot to read the credentials

level=debug msg=fetch response received digest=sha256:a67257cfe913ad09242bf98c44f2330ec7e8261ca3a8db3431cb88158c3d4837 mediatype=application/vnd.docker.image.rootfs.diff.tar.gzip response.header.accept-ranges=bytes response.header.age=714959 response.header.connection=keep-alive response.header.content-length=80847073 response.header.content-type=binary/octet-stream response.header.date=Mon, 06 Feb 2023 06:52:06 GMT response.header.etag="a428fafd37ee58f4bdeae1a7ff7235b5-1" response.header.last-modified=Fri, 16 Sep 2022 17:54:09 GMT response.header.server=AmazonS3 response.header.via=1.1 010c0731b9775a983eceaec0f5fa6a2e.cloudfront.net (CloudFront) response.header.x-amz-cf-id=rEfKWnJdasWIKnjWhYyqFn9eHY8v_3Y9WwSRnnkMTkPayHlBxWX1EQ== response.header.x-amz-cf-pop=HIO50-C1 response.header.x-amz-replication-status=COMPLETED response.header.x-amz-server-side-encryption=AES256 response.header.x-amz-storage-class=INTELLIGENT_TIERING response.header.x-amz-version-id=GfqTTjWbdqB0sreyjv3fyo1k6LQ9kZKC response.header.x-cache=Hit from cloudfront response.status=200 OK size=80847073 url=https://registry.redhat.io/v2/redhat/certified-operator-index/blobs/sha256:a67257cfe913ad09242bf98c44f2330ec7e8261ca3a8db3431cb88158c3d4837
level=debug msg=fetch response received digest=sha256:d242c7b4380d3c9db3ac75680c35f5c23639a388ad9313f263d13af39a9c8b8b mediatype=application/vnd.docker.image.rootfs.diff.tar.gzip response.header.accept-ranges=bytes response.header.age=595868 response.header.connection=keep-alive response.header.content-length=98028196 response.header.content-type=binary/octet-stream response.header.date=Tue, 07 Feb 2023 15:56:56 GMT response.header.etag="f702c84459b479088565e4048a890617-1" response.header.last-modified=Wed, 18 Jan 2023 06:55:12 GMT response.header.server=AmazonS3 response.header.via=1.1 7f5e0d3b9ea85d0d75063a66c0ebc840.cloudfront.net (CloudFront) response.header.x-amz-cf-id=Tw9cjJjYCy8idBiQ1PvljDkhAoEDEzuDCNnX6xJub4hGeh8V0CIP_A== response.header.x-amz-cf-pop=HIO50-C1 response.header.x-amz-replication-status=COMPLETED response.header.x-amz-server-side-encryption=AES256 response.header.x-amz-storage-class=INTELLIGENT_TIERING response.header.x-amz-version-id=nt7yY.YmjWF0pfAhzh_fH2xI_563GnPz response.header.x-cache=Hit from cloudfront response.status=200 OK size=98028196 url=https://registry.redhat.io/v2/redhat/certified-operator-index/blobs/sha256:d242c7b4380d3c9db3ac75680c35f5c23639a388ad9313f263d13af39a9c8b8b
level=debug msg=fetch response received digest=sha256:664a8226a152ea0f1078a417f2ec72d3a8f9971e8a374859b486b60049af9f18 mediatype=application/vnd.docker.container.image.v1+json response.header.accept-ranges=bytes response.header.age=17430 response.header.connection=keep-alive response.header.content-length=24828 response.header.content-type=binary/octet-stream response.header.date=Tue, 14 Feb 2023 08:37:35 GMT response.header.etag="57eb6fdca8ce82a837bdc2cebadc3c7b-1" response.header.last-modified=Mon, 13 Feb 2023 16:11:57 GMT response.header.server=AmazonS3 response.header.via=1.1 0c96ded7ff282d2dbcf47c918b6bb500.cloudfront.net (CloudFront) response.header.x-amz-cf-id=w9zLDWvPJ__xbTpI8ba5r9DRsFXbvZ9rSx5iksG7lFAjWIthuokOsA== response.header.x-amz-cf-pop=HIO50-C1 response.header.x-amz-replication-status=COMPLETED response.header.x-amz-server-side-encryption=AES256 response.header.x-amz-version-id=Enw8mLebn4.ShSajtLqdo4riTDHnVEFZ response.header.x-cache=Hit from cloudfront response.status=200 OK size=24828 url=https://registry.redhat.io/v2/redhat/certified-operator-index/blobs/sha256:664a8226a152ea0f1078a417f2ec72d3a8f9971e8a374859b486b60049af9f18
level=debug msg=fetch response received digest=sha256:130c9d0ca92e54f59b68c4debc5b463674ff9555be1f319f81ca2f23e22de16f mediatype=application/vnd.docker.image.rootfs.diff.tar.gzip response.header.accept-ranges=bytes response.header.age=829779 response.header.connection=keep-alive response.header.content-length=26039246 response.header.content-type=binary/octet-stream response.header.date=Sat, 04 Feb 2023 22:58:25 GMT response.header.etag="a08688b701b31515c6861c69e4d87ebd-1" response.header.last-modified=Tue, 06 Dec 2022 20:50:51 GMT response.header.server=AmazonS3 response.header.via=1.1 000f4a2f631bace380a0afa747a82482.cloudfront.net (CloudFront) response.header.x-amz-cf-id=S-h31zheAEOhOs6uH52Rpq0ZnoRRdd5VfaqVbZWXzAX-Zym-0XtuKA== response.header.x-amz-cf-pop=HIO50-C1 response.header.x-amz-replication-status=COMPLETED response.header.x-amz-server-side-encryption=AES256 response.header.x-amz-storage-class=INTELLIGENT_TIERING response.header.x-amz-version-id=BQOjon.COXTTON_j20wZbWWoDEmGy1__ response.header.x-cache=Hit from cloudfront response.status=200 OK size=26039246 url=https://registry.redhat.io/v2/redhat/certified-operator-index/blobs/sha256:130c9d0ca92e54f59b68c4debc5b463674ff9555be1f319f81ca2f23e22de16f




level=debug msg=do request digest=sha256:db8e9d2f583af66157f383f9ec3628b05fa0adb0d837269bc9f89332c65939b9 mediatype=application/vnd.docker.image.rootfs.diff.tar.gzip request.header.accept=application/vnd.docker.image.rootfs.diff.tar.gzip, */* request.header.range=bytes=13417268- request.header.user-agent=opm/alpha request.method=GET size=91700480 url=https://registry.redhat.io/v2/redhat/certified-operator-index/blobs/sha256:db8e9d2f583af66157f383f9ec3628b05fa0adb0d837269bc9f89332c65939b9
level=debug msg=fetch response received digest=sha256:db8e9d2f583af66157f383f9ec3628b05fa0adb0d837269bc9f89332c65939b9 mediatype=application/vnd.docker.image.rootfs.diff.tar.gzip response.header.cache-control=max-age=0, no-cache, no-store response.header.connection=keep-alive response.header.content-length=99 response.header.content-type=application/json response.header.date=Tue, 14 Feb 2023 13:34:06 GMT response.header.docker-distribution-api-version=registry/2.0 response.header.expires=Tue, 14 Feb 2023 13:34:06 GMT response.header.pragma=no-cache response.header.registry-proxy-request-id=0d7ea55f-e96d-4311-885a-125b32c8e965 response.header.www-authenticate=Bearer realm="https://registry.redhat.io/auth/realms/rhcc/protocol/redhat-docker-v2/auth",service="docker-registry",scope="repository:redhat/certified-operator-index:pull" response.status=401 Unauthorized size=91700480 url=https://registry.redhat.io/v2/redhat/certified-operator-index/blobs/sha256:db8e9d2f583af66157f383f9ec3628b05fa0adb0d837269bc9f89332c65939b9.

Expected results:

Should always read the credentials for the command .

Bug OCPBUGS-23094: [gcp] IPI or UPI private cluster on GCP failed due to ingress LB stuck in Pending

View the Description View the linked PRs

Description of problem:

IPI or UPI installing a private cluster on GCP always fail, with the cluster operator ingress telling LoadBalancerPending and CanaryChecksRepetitiveFailures

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-11-07-233748

How reproducible:

Always

Steps to Reproduce:

1. create a private cluster on GCP, either IPI or UPI

Actual results:

The installation failed, with ingress operator degraded.

Expected results:

The installation can succeed.

Additional info:

Some PROW CI tests: 

https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.15-arm64-nightly-gcp-ipi-private-f28-longduration-cloud/1722352860160593920 (Must-gather https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.15-arm64-nightly-gcp-ipi-private-f28-longduration-cloud/1722352860160593920/artifacts/gcp-ipi-private-f28-longduration-cloud/gather-must-gather/artifacts/must-gather.tar)

https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-gcp-ipi-xpn-private-f28/1722176483704705024

https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-gcp-ipi-private-fips-f6-disasterrecovery/1722066338567950336


FYI QE Flexy-install jobs: IPI Flexy-install/245364/, UPI Flexy-install/245524/

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          14h     Unable to apply 4.15.0-0.nightly-2023-11-07-233748: some cluster operators are not available
$ oc get nodes
NAME                                                           STATUS   ROLES                  AGE   VERSION
jiwei-1108-priv-kx7b4-master-0.c.openshift-qe.internal         Ready    control-plane,master   14h   v1.28.3+4cbdd29
jiwei-1108-priv-kx7b4-master-1.c.openshift-qe.internal         Ready    control-plane,master   14h   v1.28.3+4cbdd29
jiwei-1108-priv-kx7b4-master-2.c.openshift-qe.internal         Ready    control-plane,master   14h   v1.28.3+4cbdd29
jiwei-1108-priv-kx7b4-worker-a-l28pl.c.openshift-qe.internal   Ready    worker                 14h   v1.28.3+4cbdd29
jiwei-1108-priv-kx7b4-worker-b-84bx5.c.openshift-qe.internal   Ready    worker                 14h   v1.28.3+4cbdd29
$ oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.15.0-0.nightly-2023-11-07-233748   False       False         True       14h     OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.jiwei-1108-priv.qe.gcp.devcluster.openshift.com/healthz": dial tcp: lookup oauth-openshift.apps.jiwei-1108-priv.qe.gcp.devcluster.openshift.com on 172.30.0.10:53: no such host (this is likely result of malfunctioning DNS server)
baremetal                                  4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
cloud-controller-manager                   4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
cloud-credential                           4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
cluster-autoscaler                         4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
config-operator                            4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
console                                    4.15.0-0.nightly-2023-11-07-233748   False       True          False      14h     DeploymentAvailable: 0 replicas available for console deployment...
control-plane-machine-set                  4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
csi-snapshot-controller                    4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
dns                                        4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
etcd                                       4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
image-registry                             4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
ingress                                                                         False       True          True       7h37m   The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: LoadBalancerReady=False (LoadBalancerPending: The LoadBalancer service is pending)
insights                                   4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
kube-apiserver                             4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
kube-controller-manager                    4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
kube-scheduler                             4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
kube-storage-version-migrator              4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
machine-api                                4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
machine-approver                           4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
machine-config                             4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
marketplace                                4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
monitoring                                 4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
network                                    4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
node-tuning                                4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
openshift-apiserver                        4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
openshift-controller-manager               4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
openshift-samples                          4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
operator-lifecycle-manager                 4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
operator-lifecycle-manager-catalog         4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
operator-lifecycle-manager-packageserver   4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
service-ca                                 4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
storage                                    4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
$ oc describe co ingress
Name:         ingress
Namespace:    
Labels:       <none>
Annotations:  include.release.openshift.io/ibm-cloud-managed: true
              include.release.openshift.io/self-managed-high-availability: true
              include.release.openshift.io/single-node-developer: true
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
Metadata:
  Creation Timestamp:  2023-11-08T10:38:15Z
  Generation:          1
  Owner References:
    API Version:     config.openshift.io/v1
    Controller:      true
    Kind:            ClusterVersion
    Name:            version
    UID:             dbaae892-1b6d-480d-a201-0549d0a3149d
  Resource Version:  172514
  UID:               3922a9fe-584f-458f-ac4f-b62b4842758e
Spec:
Status:
  Conditions:
    Last Transition Time:  2023-11-08T17:49:01Z
    Message:               The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: LoadBalancerReady=False (LoadBalancerPending: The LoadBalancer service is pending)
    Reason:                IngressUnavailable
    Status:                False
    Type:                  Available
    Last Transition Time:  2023-11-08T11:02:27Z
    Message:               Not all ingress controllers are available.
    Reason:                Reconciling
    Status:                True
    Type:                  Progressing
    Last Transition Time:  2023-11-08T17:51:01Z
    Message:               The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: LoadBalancerReady=False (LoadBalancerPending: The LoadBalancer service is pending), CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)
    Reason:                IngressDegraded
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2023-11-08T10:52:36Z
    Reason:                IngressControllersUpgradeable
    Status:                True
    Type:                  Upgradeable
    Last Transition Time:  2023-11-08T10:52:36Z
    Reason:                AsExpected
    Status:                False
    Type:                  EvaluationConditionsDetected
  Extension:               <nil>
  Related Objects:
    Group:      
    Name:       openshift-ingress-operator
    Resource:   namespaces
    Group:      operator.openshift.io
    Name:       
    Namespace:  openshift-ingress-operator
    Resource:   ingresscontrollers
    Group:      ingress.operator.openshift.io
    Name:       
    Namespace:  openshift-ingress-operator
    Resource:   dnsrecords
    Group:      
    Name:       openshift-ingress
    Resource:   namespaces
    Group:      
    Name:       openshift-ingress-canary
    Resource:   namespaces
Events:         <none>
$ oc get pods -n openshift-ingress-operator -o wide
NAME                                READY   STATUS    RESTARTS      AGE   IP            NODE                                                     NOMINATED NODE   READINESS GATES
ingress-operator-57c555c75b-gqbk6   2/2     Running   2 (14h ago)   14h   10.129.0.36   jiwei-1108-priv-kx7b4-master-1.c.openshift-qe.internal   <none>           <none>
$ oc -n openshift-ingress-operator logs ingress-operator-57c555c75b-gqbk6
...output omitted...
2023-11-08T10:56:53.715Z    ERROR    operator.ingress_controller    controller/controller.go:118    got retryable error; requeueing    {"after": "1m0s", "error": "IngressController is degraded: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.), DeploymentReplicasMinAvailable=False (DeploymentMinimumReplicasNotMet: 0/2 of replicas are available, max unavailable is 1: Some pods are not scheduled: Pod \"router-default-7c86c4f4b5-jsljz\" cannot be scheduled: 0/3 nodes are available: 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.. Pod \"router-default-7c86c4f4b5-pltz4\" cannot be scheduled: 0/3 nodes are available: 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.. Make sure you have sufficient worker nodes.), LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: googleapi: Error 400: INSTANCE_IN_MULTIPLE_LOAD_BALANCED_IGS - Validation failed for instance 'projects/openshift-qe/zones/us-central1-a/instances/jiwei-1108-priv-kx7b4-master-0': instance may belong to at most one load-balanced instance group.\nThe kube-controller-manager logs may contain more details.)"}
...output omitted...
2023-11-08T15:13:41.323Z    ERROR    operator.ingress_controller    controller/controller.go:118    got retryable error; requeueing    {"after": "1m0s", "error": "IngressController is degraded: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: googleapi: Error 400: Resource 'projects/openshift-qe/zones/us-central1-b/instances/jiwei-1108-priv-kx7b4-worker-b-84bx5' is expected to be in the subnetwork 'projects/openshift-qe/regions/us-central1/subnetworks/jiwei-1108-priv-master-subnet' but is in the subnetwork 'projects/openshift-qe/regions/us-central1/subnetworks/jiwei-1108-priv-worker-subnet'., wrongSubnetwork\nThe kube-controller-manager logs may contain more details.), CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)"}
...output omitted...
$ 

Must-gather https://drive.google.com/file/d/1zwhJ4ga0-tQuRorha4XnUGUKbSTx1fx4/view?usp=drive_link

https://github.com/openshift/cloud-provider-gcp/pull/41

Task MON-3479: Update downstream prometheus-operator to v0.69.1

View the linked PRs

Bug OCPBUGS-21626: tokenConfig's accessTokenInactivityTimeout in hosted cluster is not consistent with management cluster

View the Description View the linked PRs

Description: If tokenConfig.accessTokenInactivityTimeout set to less than 300s, the accessTokenInactivityTimeout doesn't work in hosted cluster whereas in Management cluster, we get below error while trying to set the timeout < 300s :

spec.tokenConfig.accessTokenInactivityTimeout: Invalid value: v1.Duration{Duration:100000000000}: the minimum acceptable token timeout value is 300 seconds*

Steps to reproduce the issue:

1. Install a fresh 4.15 hypershift cluster  
2. Configure accessTokenInactivityTimeout as below:
$ oc edit hc -n clusters
...
  spec:
    configuration:
      oauth:
        identityProviders:
        ...
        tokenConfig:          
          accessTokenInactivityTimeout: 100s
...
3. Wait for the oauth pods to redeploy and check the oauth cm for updated accessTokenInactivityTimeout value:
$ oc get cm oauth-openshift -oyaml -n clusters-hypershift-ci-xxxxx 
...
        tokenConfig:           
          accessTokenInactivityTimeout: 1m40s
...
4. Login to guest cluster with testuser-1 and get the token
$ oc login https://a889<...>:6443 -u testuser-1 -p xxxxxxx
$ TOKEN=`oc whoami -t`

Actual result:

Wait for 100s and try login with the TOKEN
$ oc login --token="$TOKEN"
WARNING: Using insecure TLS client config. Setting this option is not supported!
Logged into "https://a889<...>:6443" as "testuser-1" using the token provided.
You don't have any projects. You can try to create a new project, by running
    oc new-project <projectname>

Expected result:

1. Login fails if the user is not active within the accessTokenInactivityTimeout seconds.

2. In Management cluster, we get below error when trying to set the timeout to less than 300s :
spec.tokenConfig.accessTokenInactivityTimeout: Invalid value: v1.Duration{Duration:100000000000}: the minimum acceptable token timeout value is 300 seconds* 
Implement the same in hosted cluster.

https://github.com/openshift/hypershift/pull/3110

Bug OCPBUGS-23128: buildah has trouble with transient mounting of nodev/noexec/nosuid/readonly items

View the Description View the linked PRs

Description of problem:

When building images, items such as the /run/secrets/redhat.repo file from the build container are bind-mounted into the rootfs of the image being built for the benefit of RUN instructions.  For a privileged build, the fact that the bind includes the nodev/noexec/nosuid flags doesn't cause any problems.  When attempting the build without privileges, where the source file (itself mounted into the build container from the host) is not owned by the user the builder container is running as, this can fail because the kernel won't allow a bind mount that tries to remove any of these flags, and the logic which handled transient mounts when using chroot isolation wasn't taking enough care to avoid that possibility.

Version-Release number of selected component (if applicable):

buildah-1.32.0 and earlier

How reproducible:

Always

Steps to Reproduce:

1. On a single-node setup, `touch` /etc/yum.repos.d/redhat.repo, which is the target of a symbolic link in /usr/share/rhel/secrets, which /usr/share/containers/mounts.conf tells CRI-O should have its contents exposed in containers.
2. Attempt to build this spec:
{{
apiVersion: build.openshift.io/v1
kind: Build
metadata:
  name: unprivileged
spec:
  source:
    type: Dockerfile
    dockerfile: |
      FROM registry.fedoraproject.org/fedora-minimal
      RUN find /run/secrets -ls
      RUN head /proc/self/uid_map /proc/self/gid_map /run/secrets/redhat.repo
  strategy:
    type: Docker
    dockerStrategy:
      env:
      - name: BUILD_PRIVILEGED
        value: "false"
}}
3.

Actual results:

error running subprocess: remounting "/tmp/buildahXXX/mnt/rootfs/run/secrets/redhat.repo" in mount namespace with expected flags: operation not permitted

Expected results:

No such mount error.  Depending on the permissions on the file, the unprivileged build may still fail if it attempts to use the contents of that file, but that's not a bug in the builder so much as a consequence of access controls.

Additional info:

https://github.com/openshift/builder/pull/358

Bug OCPBUGS-21648: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-azure/pull/285

Bug OCPBUGS-23697: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-alibaba-cloud/pull/48

Bug OCPBUGS-24026: Installer TLS artifacts should have ownership annotations

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7749

Bug OCPBUGS-17458: kube-controller-manager pod in openshift-kube-controller-manager namespace keeps reporting "failed to synchronize namespace" these messages appear even though the namespace is long gone

View the Description View the linked PRs

Description of problem:

The kube-controller-manager pod in openshift-kube-controller-manager namespace keeps reporting "failed to synchronize namespace" after deleing the namespace.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

The namespace has been deleted long ago, but we still  kube-controller-manager pod in openshift-kube-controller-manager namespace keeps reporting "failed to synchronize namespace"

Expected results:

It's should not report for deleted namespace

Additional info:

https://github.com/openshift/cluster-policy-controller/pull/130

Bug OCPBUGS-19217: Update 4.15 ibm-vpc-node-label-updater image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ibm-vpc-node-label-updater/pull/25

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ibm-vpc-node-label-updater/pull/25

Bug OCPBUGS-19715: Do not configure the node webhook if not using ovn-kubernetes

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2030

Bug OCPBUGS-22757: [4.15] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.15. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-20356~~.

https://github.com/openshift/installer/pull/7654

Bug OCPBUGS-19233: Update 4.15 monitoring-plugin image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/monitoring-plugin/pull/75

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/monitoring-plugin/pull/75

Bug OCPBUGS-19699: Remove warning about CPUPartitioning

View the Description View the linked PRs

Description of problem:


When CPUPartitioning is not set in install-config.yaml a warning message is still generated

WARNING CPUPartitioning:  is ignored

This warning is both incorrect, since the check is against "None" and the the value is an empty string when not set, and also no longer relevant now that https://issues.redhat.com//browse/OCPBUGS-18876 has been fixed.

Version-Release number of selected component (if applicable):

How reproducible:

Every time

Steps to Reproduce:

1. Create an install config with CPUPartitioning not set
2. Run "openshift-install agent create image --dir cluster-manifests/ --log-level debug"

Actual results:

See the output "WARNING CPUPartitioning:  is ignored"

Expected results:

No warning

Additional info:

https://github.com/openshift/installer/pull/7527

Bug OCPBUGS-20531: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/multus-admission-controller/pull/76

Bug OCPBUGS-24158: Update 4.15 kube-state-metrics-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kube-state-metrics/pull/105

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kube-state-metrics/pull/105

Bug OCPBUGS-19119: Update 4.15 cluster-policy-controller image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-policy-controller/pull/131

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-policy-controller/pull/131

Bug OCPBUGS-20503: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-olm/pull/585

Bug OCPBUGS-21933: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-version-operator/pull/983

Bug OCPBUGS-23149: Wrong IP for deploying IPv6 BMCs

View the Description View the linked PRs

Since Apache runs as part of the metal3 Deployment, it exists on only one node. There is no guarantee that the API VIP will land (or stay) on the same node, so this fails to work more often than not. Kube-proxy does not do anything to redirect traffic to pods with host networking enabled, such as the metal3 Deployment.

The IPv6 is passed to the baremetal-operator. This has been split into its own Deployment since the first iteration of ~~OCPBUGS-4228~~, in which we collected the IP address of the host from the deployed metal3 Pod. At the time that caused a circular dependency of the Deployment on its own Pod, but this would no longer be the case. However, a backport beyond 4.14 would require the Deployment split to also be backported.

Alternatively, ironic-proxy could be adapted to also proxy the images produced by ironic. This would be new functionality that would also need to be backported.

Finally, we could determine the host IP from inside the baremetal-operator container instead of from cluster-baremetal-operator. However, this approach has not been tried and would only work in backports because it relies on baremetal-operator continuing to run within same Pod as ironic.

https://github.com/openshift/cluster-baremetal-operator/pull/380

Bug OCPBUGS-21815: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-operator/pull/1170

Bug ACM-7278: hcp with --secrets-creds provided still requires pull secret

View the Description View the linked PRs

Description of problem:

When we try to create a cluster with --secret-creds, an MCE AWS k8s secret that includes aws-creds, pull secret, and base domain, then the binary should not ask for pull secret. However, it does now after changing from hypershift.

Adding pull secret param will allow the command to continue as expected, though I would think whole point of the secret-creds is to reuse what exists.

 /usr/local/bin/hcp create cluster aws --name acmqe-hc-ad5b1f645d93464c --secret-creds test1-cred --region us-east-1 --node-pool-replicas 1 --namespace local-cluster --instance-type m6a.xlarge --release-image quay.io/openshift-release-dev/ocp-release:4.14.0-ec.4-multi --generate-ssh Output:
  Error: required flag(s) "pull-secret" not set
  required flag(s) "pull-secret" not set

Version-Release number of selected component (if applicable):

2.4.0-DOWNANDBACK-2023-08-31-13-34-02 or mce 2.4.0-137

hcp version openshift/hypershift: 8b4b52925d47373f3fe4f0d5684c88dc8a93368a. Latest supported OCP: 4.14.0

How reproducible:

always

Steps to Reproduce:

download hcp cli from mce
run hcp cluster create aws with valid secret-creds param
...

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3013

Bug OCPBUGS-21776: [HyperShift] Runtime zero namespaces are not excluded from pod security in guest cluster

View the Description View the linked PRs

Description of problem: runtime zero namespaces ("default", "kube-system", "kube-public") are not excluded from pod security admission in hypershift guest cluster.
In OCP, these runtime zero namespaces are excluded from PSA.

How reproducible: Always

Steps to Reproduce:

1. Install a fresh 4.14 hypershift cluster
2. Check the labels under default, kube-system, kube-public namespaces
3. Try to change the PSA value on these namespaces in hypershift guest cluster and the values are getting updated.

Actual results:

$ oc get ns default -oyaml --kubeconfig=guest.kubeconfig
...
  labels:
    kubernetes.io/metadata.name: default
  name: default
...
$ oc label ns default pod-security.kubernetes.io/enforce=restricted --overwrite --kubeconfig=guest.kubeconfig
namespace/default labeled
$ oc get ns default -oyaml --kubeconfig=guest.kubeconfig
...
  labels:
    kubernetes.io/metadata.name: default
    pod-security.kubernetes.io/enforce: restricted
  name: default

Expected results:

Runtime zero namespaces ("default", "kube-system", "kube-public") are excluded from pod security admission

Additional info:

kube-system ns is excluded from PSA in guest cluster but when try to update security.openshift.io/scc.podSecurityLabelSync value with true/false, it is not updated where as in management cluster podSecurityLabelSync value will get updated.

https://github.com/openshift/hypershift/pull/3115

Bug OCPBUGS-20391: Revert https://issues.redhat.com//browse/NETOBSERV-987

View the Description View the linked PRs

Revert https://github.com/openshift/must-gather/pull/357 as it's part of a dedicated https://github.com/netobserv/must-gather image

https://github.com/openshift/must-gather/pull/390

Task OPRUN-3106: Investigate and Fix e2e Failures in Image Update test

View the Description View the linked PRs

This Downstream PR is failing continuously on the Image Update Test, the goal of this task is to identify the root cause and fix it.

https://github.com/openshift/operator-framework-olm/pull/600

Bug OCPBUGS-17589: CBO crashes if internal IP is nil

View the Description View the linked PRs

This bug has been seen during the analysis of another issue

If the Server Internal IP is not defined, CBO crashes as nil is not handled in https://github.com/openshift/cluster-baremetal-operator/blob/release-4.12/provisioning/utils.go#L99

I0809 17:33:09.683265       1 provisioning_controller.go:540] No Machines with cluster-api-machine-role=master found, set provisioningMacAddresses if the metal3 pod fails to start

I0809 17:33:09.690304       1 clusteroperator.go:217] "new CO status" reason=SyncingResources processMessage="Applying metal3 resources" message=""

I0809 17:33:10.488862       1 recorder_logging.go:37] &Event{ObjectMeta:{dummy.1779c769624884f4  dummy    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] []  []},InvolvedObject:ObjectReference{Kind:Pod,Namespace:dummy,Name:dummy,UID:,APIVersion:v1,ResourceVersion:,FieldPath:,},Reason:ValidatingWebhookConfigurationUpdated,Message:Updated ValidatingWebhookConfiguration.admissionregistration.k8s.io/baremetal-operator-validating-webhook-configuration because it changed,Source:EventSource{Component:,Host:,},FirstTimestamp:2023-08-09 17:33:10.488745204 +0000 UTC m=+5.906952556,LastTimestamp:2023-08-09 17:33:10.488745204 +0000 UTC m=+5.906952556,Count:1,Type:Normal,EventTime:0001-01-01 00:00:00 +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:,ReportingInstance:,}

panic: runtime error: invalid memory address or nil pointer dereference

[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1768fd4]

 

goroutine 574 [running]:

github.com/openshift/cluster-baremetal-operator/provisioning.getServerInternalIP({0x1e774d0?, 0xc0001e8fd0?})

        /go/src/github.com/openshift/cluster-baremetal-operator/provisioning/utils.go:75 +0x154

github.com/openshift/cluster-baremetal-operator/provisioning.GetIronicIP({0x1ea2378?, 0xc000856840?}, {0x1bc1f91, 0x15}, 0xc0004c4398, {0x1e774d0, 0xc0001e8fd0})

        /go/src/github.com/openshift/cluster-baremetal-operator/provisioning/utils.go:98 +0xfb

https://github.com/openshift/cluster-baremetal-operator/pull/359

Bug OCPBUGS-23610: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-ibm/pull/64

Bug OCPBUGS-19086: Give instruction to install nmstate package in error message

View the Description View the linked PRs

Description of problem:

If nmstatectl is not present, print "install nmstate" in error message

Version-Release number of selected component (if applicable):

4.13

How reproducible:

100%

Steps to Reproduce:

1.
2.
3.

Actual results:

FATAL   * failed to validate network yaml for host 0, failed to execute 'nmstatectl gc', error: exec: "nmstatectl": executable file not found in $PATH

Expected results:

FATAL   * failed to validate network yaml for host 0, install nmstate package, exec: "nmstatectl": executable file not found in $PATH

Additional info:

https://github.com/openshift/installer/pull/7492

Bug OCPBUGS-26063: [release-4.15] IBMCloud: Add support for endpoint overrides

View the Description View the linked PRs

Description of problem:

cherry-pick of https://github.com/openshift/cluster-image-registry-operator/pull/955

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-image-registry-operator/pull/984

Bug OCPBUGS-19916: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13259

Bug OCPBUGS-23576: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes-autoscaler/pull/275

Bug OCPBUGS-19722: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/3955

Bug OCPBUGS-18387: [CORS-2550] Installer should have pre-check for vm type, DES encryption type when install with Confidential VM

View the Description View the linked PRs

Description of problem:

Install IPI cluster with confidential VM, installer should have pre-check for vm type, disk encryption type etc to avoid installation failed during infrastructure creation

1. vm type
Different security type support on different vm type
for example, set platfrom.azure.defaultMachinePlatform.type to Standard_DC8ads_v5 and platform.azure.defaultMachinePlatform.settings.securityType to TrustedLaunch, installation will be failed as Standard_DC8ads_v5 only support security type ConfidentialVM

ERROR Error: creating Linux Virtual Machine: (Name "jimaconf1-89qmp-bootstrap" / Resource Group "jimaconf1-89qmp-rg"): compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="BadRequest" Message="The VM size 'Standard_DC16ads_v5' is not supported for creation of VMs and Virtual Machine Scale Set with 'TrustedLaunch' security type." 

2. Disk encryption Set
When install cluster with ConfidentialVM +securityEncryptionType:DiskWithVMGuestState, then using customer-managed key, it requires that DES encryption type is ConfidentialVmEncryptedWithCustomerKey, else installer throw error as below:

08-31 10:12:54.443  level=error msg=Error: creating Linux Virtual Machine: (Name "jima30confa-vtrm2-bootstrap" / Resource Group "jima30confa-vtrm2-rg"): compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="BadRequest" Message="The type of the Disk Encryption Set in the request is 'ConfidentialVmEncryptedWithCustomerKey', but this Disk Encryption Set was created with type 'EncryptionAtRestWithCustomerKey'." Target="/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/jima30confa-vtrm2-rg/providers/Microsoft.Compute/disks/jima30confa-vtrm2-bootstrap_OSDisk"

Installer should check vm type and DES's encryption type to make sure that expected DES is set.

Version-Release number of selected component (if applicable):

4.14 nightly build

How reproducible:

Always

Steps to Reproduce:

1. Prepare install-config, 
   1) enable confidentialVM but use vm type which does not support Confidential VM
   2) enable TrustedLaunch but use vm type which support confidentialVM
   3) enable confidentialVM + securityEncryptionType: DiskWithVMGuestState, use customer-managed  key to encrypt managed key, but customer-managed key's encryption type is the default one "EncryptionAtRestWithPlatformKey"
2. Create cluster
3.

Actual results:

Installation failed when creating infrastructure

Expected results:

Installer should have pre-check for those scenarios, and exit with expected error message.

Additional info:

https://github.com/openshift/installer/pull/7469

Bug OCPBUGS-24191: [4.14] Load balancers are not created in ARO

View the Description View the linked PRs

After creating a 4.14 ARO cluster, some cluster operators are not available because load balancer can't be created.

It is because of the change of the default value of vmType in cloud-provider-azure.

https://github.com/kubernetes-sigs/cloud-provider-azure/pull/4214

In ARO, we use standard vmType and don't use any vmss as a cluster node, but installer doesn't specify vmType, which causes vmType mismatch and cloud-provider-azure can't configure load balancer.

https://github.com/openshift/installer/blob/release-4.14/pkg/asset/manifests/azure/cloudproviderconfig.go

We would like it to make vmType default `standard` or to have an option to change it via install config or something.

discussion thread: https://redhat-internal.slack.com/archives/C68TNFWA2/p1700814868246649

Reproducible steps:

Create an 4.14 ARO cluster.
Creating a normal cluster with standard vm in Azure might also reproduce the issue

What I got:

❯ oc get co
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.14.1    False       True          True       21m     OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.atokubi.eastus.osadev.cloud/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)...
cloud-controller-manager                   4.14.1    True        False         False      24m
cloud-credential                           4.14.1    True        False         False      26m
cluster-autoscaler                         4.14.1    True        False         False      20m
config-operator                            4.14.1    True        False         False      21m
console                                    4.14.1    False       True          False      13m     DeploymentAvailable: 0 replicas available for console deployment...
control-plane-machine-set                  4.14.1    True        False         False      14m
csi-snapshot-controller                    4.14.1    True        False         False      20m
dns                                        4.14.1    True        False         False      20m
etcd                                       4.14.1    True        False         False      19m
image-registry                             4.14.1    True        False         False      8m11s
ingress                                              False       True          True       7m36s   The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: failed to map VM Name to NodeName: VM Name atokubi-vnkt5-master-0...
insights                                   4.14.1    True        False         False      14m
kube-apiserver                             4.14.1    True        True          False      10m     NodeInstallerProgressing: 1 nodes are at revision 5; 2 nodes are at revision 6
kube-controller-manager                    4.14.1    True        False         False      18m
kube-scheduler                             4.14.1    True        False         False      17m
kube-storage-version-migrator              4.14.1    True        False         False      21m
machine-api                                4.14.1    True        False         False      11m
machine-approver                           4.14.1    True        False         False      20m
machine-config                             4.14.1    True        False         False      15m
marketplace                                4.14.1    True        False         False      20m
monitoring                                 4.14.1    True        False         False      6m53s
network                                    4.14.1    True        False         False      22m
node-tuning                                4.14.1    True        False         False      20m
openshift-apiserver                        4.14.1    True        False         False      14m
openshift-controller-manager               4.14.1    True        False         False      20m
openshift-samples                          4.14.1    True        False         False      14m
operator-lifecycle-manager                 4.14.1    True        False         False      20m
operator-lifecycle-manager-catalog         4.14.1    True        False         False      20m
operator-lifecycle-manager-packageserver   4.14.1    True        False         False      14m
service-ca                                 4.14.1    True        False         False      21m
storage                                    4.14.1    True        False         False      20m

❯ oc get svc -A | grep LoadBalancer
openshift-ingress                                  router-default                             LoadBalancer   172.30.43.24     <pending>                              80:32538/TCP,443:31115/TCP                38m

❯ oc get cm cloud-provider-config -n openshift-config -oyaml
apiVersion: v1
data:
  config: '{"cloud":"AzurePublicCloud","tenantId":"<reducted>","aadClientId":"","aadClientSecret":"","aadClientCertPath":"","aadClientCertPassword":"","useManagedIdentityExtension":false,"userAssignedIdentityID":"","subscriptionId":"<reducted>","resourceGroup":"aro-atokubi","location":"eastus","vnetName":"dev-vnet","vnetResourceGroup":"v4-eastus","subnetName":"atokubi-worker","securityGroupName":"atokubi-vnkt5-nsg","routeTableName":"atokubi-vnkt5-node-routetable","primaryAvailabilitySetName":"","vmType":"","primaryScaleSetName":"","cloudProviderBackoff":true,"cloudProviderBackoffRetries":0,"cloudProviderBackoffExponent":0,"cloudProviderBackoffDuration":6,"cloudProviderBackoffJitter":0,"cloudProviderRateLimit":false,"cloudProviderRateLimitQPS":0,"cloudProviderRateLimitBucket":0,"cloudProviderRateLimitQPSWrite":0,"cloudProviderRateLimitBucketWrite":0,"useInstanceMetadata":true,"loadBalancerSku":"standard","excludeMasterFromStandardLB":false,"disableOutboundSNAT":true,"maximumLoadBalancerRuleCount":0}'
kind: ConfigMap
metadata:
  creationTimestamp: "2023-11-29T10:08:19Z"
  name: cloud-provider-config
  namespace: openshift-config
  resourceVersion: "33363"
  uid: 8b35cf3f-65ee-428d-92e6-304165301e96

❯ oc logs azure-cloud-controller-manager-fbdfbdb86-hk646 -n openshift-cloud-controller-manager
Defaulted container "cloud-controller-manager" out of: cloud-controller-manager, azure-inject-credentials (init)
<omitted>
I1129 10:46:47.401672       1 controller.go:388] Ensuring load balancer for service openshift-ingress/router-default
I1129 10:46:47.401732       1 azure_loadbalancer.go:122] reconcileService: Start reconciling Service "openshift-ingress/router-default" with its resource basename "ac376ce0f66164eebb9fc0fa76a9c697"
I1129 10:46:47.401742       1 azure_loadbalancer.go:1533] reconcileLoadBalancer for service(openshift-ingress/router-default) - wantLb(true): started
I1129 10:46:47.401849       1 event.go:307] "Event occurred" object="openshift-ingress/router-default" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
I1129 10:46:47.505374       1 azure_loadbalancer_repo.go:73] LoadBalancerClient.List(aro-atokubi) success
I1129 10:46:47.573290       1 azure_loadbalancer.go:1557] reconcileLoadBalancer for service(openshift-ingress/router-default): lb(aro-atokubi/atokubi-vnkt5) wantLb(true) resolved load balancer name
I1129 10:46:47.643053       1 azure_vmssflex_cache.go:162] Could not find node () in the existing cache. Forcely freshing the cache to check again...
E1129 10:46:47.716774       1 azure_vmssflex.go:379] fs.GetNodeNameByIPConfigurationID(/subscriptions/fe16a035-e540-4ab7-80d9-373fa9a3d6ae/resourceGroups/aro-atokubi/providers/Microsoft.Network/networkInterfaces/atokubi-vnkt5-master0-nic/ipConfigurations/pipConfig) failed. Error: failed to map VM Name to NodeName: VM Name atokubi-vnkt5-master-0
E1129 10:46:47.716802       1 azure_loadbalancer.go:126] reconcileLoadBalancer(openshift-ingress/router-default) failed: failed to map VM Name to NodeName: VM Name atokubi-vnkt5-master-0
I1129 10:46:47.716835       1 azure_metrics.go:115] "Observed Request Latency" latency_seconds=0.315082823 request="services_ensure_loadbalancer" resource_group="aro-atokubi" subscription_id="fe16a035-e540-4ab7-80d9-373fa9a3d6ae" source="openshift-ingress/router-default" result_code="failed_ensure_loadbalancer"
E1129 10:46:47.716866       1 controller.go:291] error processing service openshift-ingress/router-default (will retry): failed to ensure load balancer: failed to map VM Name to NodeName: VM Name atokubi-vnkt5-master-0
I1129 10:46:47.716964       1 event.go:307] "Event occurred" object="openshift-ingress/router-default" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: failed to map VM Name to NodeName: VM Name atokubi-vnkt5-master-0"

After changing vmType from empty to "standard" in cloud-provider-config, it can configure load balancer and errors are gone.

https://github.com/openshift/installer/pull/7793

Bug OCPBUGS-24473: when set a custom endpoint, the private IAM url would be overrode together for installing a ibmcloud cluster

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7805

Bug OCPBUGS-25841: [vsphere] IPI destroy cluster failed to delete TagCategory

View the Description View the linked PRs

Description of problem:

After running ./openshift-install destroy cluster, TagCategory still exist

# ./openshift-install destroy cluster --dir cluster --log-level debug
DEBUG OpenShift Installer 4.15.0-0.nightly-2023-12-18-220750
DEBUG Built from commit 2b894776f1653ab818e368fa625019a6de82a8c7
DEBUG Power Off Virtual Machines
DEBUG Powered off                                   VirtualMachine=sgao-devqe-spn2w-master-2
DEBUG Powered off                                   VirtualMachine=sgao-devqe-spn2w-master-1
DEBUG Powered off                                   VirtualMachine=sgao-devqe-spn2w-master-0
DEBUG Powered off                                   VirtualMachine=sgao-devqe-spn2w-worker-0-kpg46
DEBUG Powered off                                   VirtualMachine=sgao-devqe-spn2w-worker-0-w5rrn
DEBUG Delete Virtual Machines
INFO Destroyed                                     VirtualMachine=sgao-devqe-spn2w-rhcos-generated-region-generated-zone
INFO Destroyed                                     VirtualMachine=sgao-devqe-spn2w-master-2
INFO Destroyed                                     VirtualMachine=sgao-devqe-spn2w-master-1
INFO Destroyed                                     VirtualMachine=sgao-devqe-spn2w-master-0
INFO Destroyed                                     VirtualMachine=sgao-devqe-spn2w-worker-0-kpg46
INFO Destroyed                                     VirtualMachine=sgao-devqe-spn2w-worker-0-w5rrn
DEBUG Delete Folder
INFO Destroyed                                     Folder=sgao-devqe-spn2w
DEBUG Delete                                        StoragePolicy=openshift-storage-policy-sgao-devqe-spn2w
INFO Destroyed                                     StoragePolicy=openshift-storage-policy-sgao-devqe-spn2w
DEBUG Delete                                        Tag=sgao-devqe-spn2w
INFO Deleted                                       Tag=sgao-devqe-spn2w
DEBUG Delete                                        TagCategory=openshift-sgao-devqe-spn2w
INFO Deleted                                       TagCategory=openshift-sgao-devqe-spn2w
DEBUG Purging asset "Metadata" from disk
DEBUG Purging asset "Master Ignition Customization Check" from disk
DEBUG Purging asset "Worker Ignition Customization Check" from disk
DEBUG Purging asset "Terraform Variables" from disk
DEBUG Purging asset "Kubeconfig Admin Client" from disk
DEBUG Purging asset "Kubeadmin Password" from disk
DEBUG Purging asset "Certificate (journal-gatewayd)" from disk
DEBUG Purging asset "Cluster" from disk
INFO Time elapsed: 29s
INFO Uninstallation complete!

# govc tags.category.ls | grep sgao
openshift-sgao-devqe-spn2w

Version-Release number of selected component (if applicable):

    4.15.0-0.nightly-2023-12-18-220750

How reproducible:

    always

Steps to Reproduce:

    1. IPI install OCP on vSphere
    2. Destroy cluster installed, check TagCategory

Actual results:

    TagCategory still exist

Expected results:

    TagCategory should be deleted

Additional info:

    Also reproduced in openshift-install-linux-4.14.0-0.nightly-2023-12-20-184526,4.13.0-0.nightly-2023-12-21-194724, while 4.12.0-0.nightly-2023-12-21-162946 have not this issue

https://github.com/openshift/installer/pull/7876

Bug OCPBUGS-19303: OKD: Agent-based Installer is broken on OKD/FCOS

View the Description View the linked PRs

Description of problem:

OKD/FCOS uses FCOS for its bootimage which lacks several tools and services such as oc and crio that the rendezvous host of the Agent-based Installer needs to set up a bootstrap control plane.

Version-Release number of selected component (if applicable):

4.13.0
4.14.0
4.15.0

https://github.com/openshift/installer/pull/7484

Bug OCPBUGS-21864: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oc-mirror/pull/712

Bug OCPBUGS-22840: [azure] Installer should have some pre-check for field plan when using marketplace image

View the Description View the linked PRs

Description of problem:

Install cluster with azure marketplace image 413.92.2023101700, and set field osImage:plan to NoPurchasePlan.

install-config.yaml:
--------------------
platform:
  azure:
    baseDomainResourceGroupName: os4-common
    cloudName: AzurePublicCloud
    outboundType: Loadbalancer
    region: southcentralus
    defaultMachinePlatform:
      osImage:
        offer: rh-ocp-worker
        publisher: redhat
        sku: rh-ocp-worker-gen1
        version: 413.92.2023101700
        plan: NoPurchasePlan


Bootstrap vm is provisioned failed with below terraform error: 

DEBUG In addition to the other similar warnings shown, 3 other variable(s) defined 
DEBUG without being declared.                      
ERROR                                              
ERROR Error: waiting for creation of Linux Virtual Machine: (Name "jima02test-7jf8d-bootstrap" / Resource Group "jima02test-7jf8d-rg"): Code="VMMarketplaceInvalidInput" Message="Creating a virtual machine from Marketplace image or a custom image sourced from a Marketplace image requires Plan information in the request. VM: '/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/jima02test-7jf8d-rg/providers/Microsoft.Compute/virtualMachines/jima02test-7jf8d-bootstrap'." 
ERROR                                              
ERROR   with azurerm_linux_virtual_machine.bootstrap, 
ERROR   on main.tf line 194, in resource "azurerm_linux_virtual_machine" "bootstrap": 
ERROR  194: resource "azurerm_linux_virtual_machine" "bootstrap" { 
ERROR                                              
ERROR failed to fetch Cluster: failed to generate asset "Cluster": failure applying terraform for "bootstrap" stage: failed to create cluster: failed to apply Terraform: exit status 1 
ERROR                                              
ERROR Error: waiting for creation of Linux Virtual Machine: (Name "jima02test-7jf8d-bootstrap" / Resource Group "jima02test-7jf8d-rg"): Code="VMMarketplaceInvalidInput" Message="Creating a virtual machine from Marketplace image or a custom image sourced from a Marketplace image requires Plan information in the request. VM: '/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/jima02test-7jf8d-rg/providers/Microsoft.Compute/virtualMachines/jima02test-7jf8d-bootstrap'." 
ERROR                                              
ERROR   with azurerm_linux_virtual_machine.bootstrap, 
ERROR   on main.tf line 194, in resource "azurerm_linux_virtual_machine" "bootstrap": 
ERROR  194: resource "azurerm_linux_virtual_machine" "bootstrap" { 
ERROR                                              
ERROR

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-11-01-235040

How reproducible:

Always

Steps to Reproduce:

1. set azure marketplace image(it has purchase plan) and plan:NoPurchasePlan in install-config.yaml file
2. trigger the installation
3.

Actual results:

bootstrap vm is provisioned failed.

Expected results:

installer should have some validation for plan when using marketplace image with purchase plan, and exit earlier with proper message

Additional info:

https://github.com/openshift/installer/pull/7721

Bug OCPBUGS-4069: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/2117

Bug OCPBUGS-19154: Update 4.15 openshift-enterprise-registry image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/image-registry/pull/379

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/image-registry/pull/379

Bug OCPBUGS-22213: Links for CodeEditor in console-dynamic-plugin-sdk api docs are returning 404

View the Description View the linked PRs

Description of problem:

Link for CodeEditor component are returning 404.
Check link for options and ref parameters https://github.com/openshift/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#codeeditor

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-26535: [4.15] SDN Failues for [sig-network][Feature:tuning] sysctl allowlist update should start a pod with custom sysctl only [when the sysctl is added to whitelist [Suite:openshift/conformance/parallel]

View the Description View the linked PRs

Seeing failures for SDN periodics running [sig-network][Feature:tuning] sysctl allowlist update should start a pod with custom sysctl only when the sysctl is added to whitelist [Suite:openshift/conformance/parallel] beginning with 4.16.0-0.nightly-2024-01-05-205447

sippy: sysctl allowlist update should start a pod with custom sysctl only when the sysctl is added to whitelist

  Jan  5 23:14:22.066: INFO: At 2024-01-05 23:14:09 +0000 UTC - event for testpod: {kubelet ip-10-0-54-42.us-west-2.compute.internal} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_testpod_e2e-test-tuning-bzspr_2a9ce6e0-726d-47a6-ac64-71d430926574_0(968a55c5afd81e077b1d15a4129084d5f15002ac3ae6aa9fe32648e841940fe2): error adding pod e2e-test-tuning-bzspr_testpod to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): timed out waiting for the condition

That payload contains OCPBUGS-26222: Adds a wait on unix socket readiness not sure that is the cause but will investigate.

https://github.com/openshift/multus-cni/pull/209

Bug OCPBUGS-23947: [azure] Fail to create cluster on existing vnet on MAG and ASH

View the Description View the linked PRs

When creating cluster on existing vnet on MAG and ASH, installer failed and threw out the error:

11-27 13:42:03.944  level=info msg=Creating infrastructure resources...
11-27 13:42:04.502  level=fatal msg=failed to fetch Cluster: failed to generate asset "Cluster": failed to get the virtual network "jima27maga-vnet": GET https://management.azure.com/subscriptions/8fe0c1b4-8b05-4ef7-8129-7cf5680f27e7/resourceGroups/jima27maga-rg/providers/Microsoft.Network/virtualNetworks/jima27maga-vnet
11-27 13:42:04.503  level=fatal msg=--------------------------------------------------------------------------------
11-27 13:42:04.503  level=fatal msg=RESPONSE 404: 404 Not Found
11-27 13:42:04.503  level=fatal msg=ERROR CODE: SubscriptionNotFound
11-27 13:42:04.503  level=fatal msg=--------------------------------------------------------------------------------
11-27 13:42:04.503  level=fatal msg={
11-27 13:42:04.503  level=fatal msg=  "error": {
11-27 13:42:04.503  level=fatal msg=    "code": "SubscriptionNotFound",
11-27 13:42:04.503  level=fatal msg=    "message": "The subscription '8fe0c1b4-8b05-4ef7-8129-7cf5680f27e7' could not be found."
11-27 13:42:04.504  level=fatal msg=  }
11-27 13:42:04.504  level=fatal msg=}
11-27 13:42:04.504  level=fatal msg=--------------------------------------------------------------------------------
11-27 13:42:04.504  level=fatal

During destroying cluster, got below error when removing shared tags.

$ ./openshift-install destroy cluster --dir ipi --log-level debug
DEBUG OpenShift Installer 4.15.0-0.nightly-2023-11-25-110147 
DEBUG Built from commit 1ea1a54a197501cdbda71196c7fac744f835217f 
INFO Credentials loaded from file "/home/fedora/.azure/osServicePrincipal_gov.json" 
DEBUG deleting public records                      
WARNING no DNS records found: either they were already deleted or the service principal lacks permissions to list them 
DEBUG deleting resource group                      
INFO deleted                                       resource group=jima761122c-264bb-rg
DEBUG deleting application registrations           
DEBUG failed to query resources with shared tag: POST https://management.azure.com/providers/Microsoft.ResourceGraph/resources 
DEBUG -------------------------------------------------------------------------------- 
DEBUG RESPONSE 400: 400 Bad Request                
DEBUG ERROR CODE: BadRequest                       
DEBUG -------------------------------------------------------------------------------- 
DEBUG {                                            
DEBUG   "error": {                                 
DEBUG     "code": "BadRequest",                    
DEBUG     "message": "Please provide below info when asking for support: timestamp = 2023-11-27T06:25:26.3355852Z, correlationId = b4dfd555-86b0-4e68-aec7-f75cd7307c69.", 
DEBUG     "details": [                             
DEBUG       {                                      
DEBUG         "code": "NoValidSubscriptionsInQueryRequest", 
DEBUG         "message": "There must be at least one subscription that is eligible to contain resources. Given: '8fe0c1b4-8b05-4ef7-8129-7cf5680f27e7'." 
DEBUG       }                                      
DEBUG     ]                                        
DEBUG   }                                          
DEBUG }                                            
DEBUG -------------------------------------------------------------------------------- 
DEBUG                                              
FATAL Failed to destroy cluster: failed to remove shared tags: failed to query resources with shared tag: POST https://management.azure.com/providers/Microsoft.ResourceGraph/resources 
FATAL -------------------------------------------------------------------------------- 
FATAL RESPONSE 400: 400 Bad Request                
FATAL ERROR CODE: BadRequest                       
FATAL -------------------------------------------------------------------------------- 
FATAL {                                            
FATAL   "error": {                                 
FATAL     "code": "BadRequest",                    
FATAL     "message": "Please provide below info when asking for support: timestamp = 2023-11-27T06:25:26.3355852Z, correlationId = b4dfd555-86b0-4e68-aec7-f75cd7307c69.", 
FATAL     "details": [                             
FATAL       {                                      
FATAL         "code": "NoValidSubscriptionsInQueryRequest", 
FATAL         "message": "There must be at least one subscription that is eligible to contain resources. Given: '8fe0c1b4-8b05-4ef7-8129-7cf5680f27e7'." 
FATAL       }                                      
FATAL     ]                                        
FATAL   }                                          
FATAL }                                            
FATAL -------------------------------------------------------------------------------- 
FATAL

Issue should be introduced by https://github.com/openshift/installer/pull/7611/, since all accepted nightly builds on 4.15 contains PR#7611, it is unable to verify on previous payloads, but checked Prow CI jobs, installation succeeded with 4.15.0-0.nightly-2023-11-20-045323.

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-11-25-110147

How reproducible:

always

Steps to Reproduce:

1. Install cluster on existing vnet on MAG and ASH

Actual results:

Installation failed.

Expected results:

Installation succeeded.

https://github.com/openshift/installer/pull/7768

Bug OCPBUGS-26195: regression - aws-ebs-csi-driver-node- fails to deploy too many times because of SCCs

View the Description View the linked PRs

This is a clone of issue OCPBUGS-25125. The following is the description of the original issue:
—
Description of problem:

 The `aws-ebs-csi-driver-node-` appears to be failing to deploy way too often in the CI recently

Version-Release number of selected component (if applicable):

    4.14

How reproducible:

  in a statistically significant pattern

Steps to Reproduce:

    1. run OCP test suite many times for it to matter

Actual results:

    fail [github.com/openshift/origin/test/extended/authorization/scc.go:76]: 1 pods failed before test on SCC errors
Error creating: pods "aws-ebs-csi-driver-node-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted-v2: .spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[2]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[3]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[4]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[5]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, provider restricted-v2: .containers[0].privileged: Invalid value: true: Privileged containers are not allowed, provider restricted-v2: .containers[0].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[0].containers[0].hostPort: Invalid value: 10300: Host ports are not allowed to be used, provider restricted-v2: .containers[1].privileged: Invalid value: true: Privileged containers are not allowed, provider restricted-v2: .containers[1].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[1].containers[0].hostPort: Invalid value: 10300: Host ports are not allowed to be used, provider restricted-v2: .containers[2].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[2].containers[0].hostPort: Invalid value: 10300: Host ports are not allowed to be used, provider "restricted": Forbidden: not usable by user or serviceaccount, provider "nonroot-v2": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount] for DaemonSet.apps/v1/aws-ebs-csi-driver-node -n openshift-cluster-csi-drivers happened 4 times

Expected results:

Test pass

Additional info:

Link to the regression dashboard - https://sippy.dptools.openshift.org/sippy-ng/component_readiness/capability?baseEndTime=2023-10-31%2023%3A59%3A59&baseRelease=4.14&baseStartTime=2023-10-04%2000%3A00%3A00&capability=SCC&component=oauth-apiserver&confidence=95&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&pity=5&sampleEndTime=2023-12-11%2023%3A59%3A59&sampleRelease=4.15&sampleStartTime=2023-12-05%2000%3A00%3A00

[sig-auth][Feature:SCC][Early] should not have pod creation failures during install [Suite:openshift/conformance/parallel]

https://github.com/openshift/csi-operator/pull/93

Bug OCPBUGS-20356: [4.15] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.15. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-18945~~.

https://github.com/openshift/installer/pull/7616

Bug OCPBUGS-22744: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/telemeter/pull/493

Bug OCPBUGS-24078: Update 4.15 ose-cluster-policy-controller-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-policy-controller/pull/143

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-policy-controller/pull/143

Bug OCPBUGS-25521: Update 4.15 ose-csi-external-snapshotter-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-snapshotter/pull/134

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-snapshotter/pull/134

Bug OCPBUGS-19167: Update 4.15 ose-olm-rukpak image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/operator-framework-rukpak/pull/34

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/operator-framework-rukpak/pull/35

Bug OCPBUGS-23263: Update i18next-parser dev dependency in console

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13330

Bug OCPBUGS-16514: OCP 4.14 | Execution of two oc tag commands in a row, creates wrong .image.dockerImageMetadata

View the Description View the linked PRs

Description of problem:

When I execute the following two tag commands in a row on OCP 4.14.0-ec.3, Multi-Arch:

  oc tag $IMAGE@$DIGEST_MANIFEST test-1:tag-manifest
  sleep 0
  oc tag $IMAGE@$DIGEST_MANIFEST test-1:tag-manifest-preserve-original --import-mode=PreserveOriginal

Then wrong data is written to the .image.dockerImageMetadata record.
If there is a delay between these two commands, e.g. sleep 5, then the image.dockerImageMetadata contains correct data.

Version-Release number of selected component (if applicable):

How reproducible:

Run the below script and you see the error. If you change the SLEEP_TIME=5, then the script passes. No problem.

Steps to Reproduce:

#!/usr/bin/env bash
set -e
SLEEP_TIME=0     # Test will fail, when sleep time is 0, use delay of 3 sec or more to pass this test

IMAGE="quay.io/podman/hello"
podman pull $IMAGE:latest
DIGEST_MANIFEST=$(podman inspect quay.io/podman/hello:latest | jq -r '.[0].Digest')

oc new-project "ir-test-001"
oc create imagestream test-1
oc import-image test-1 --from="${IMAGE}@${DIGEST_MANIFEST}" --import-mode='PreserveOriginal'

oc tag $IMAGE@$DIGEST_MANIFEST test-1:tag-manifest
sleep "${SLEEP_TIME}"
oc tag $IMAGE@$DIGEST_MANIFEST test-1:tag-manifest-preserve-original --import-mode=PreserveOriginal

sleep 5

[[ $(oc get istag test-1:tag-manifest-preserve-original -o json | jq -r '.image.dockerImageMetadata.Architecture') == "null" ]] && echo "pass: tag-manifest-preserve-original has no architecture" || echo "fail: tag-preserve-original has architecture and should not"

Actual results:

fail: tag-preserve-original has architecture and should not

oc get istag test-1:tag-manifest-preserve-original -o json | jq -r '.image.dockerImageMetadata.Architecture'
amd64

Expected results:

pass: tag-manifest-preserve-original has no architecture

oc get istag test-1:tag-manifest-preserve-original -o json | jq -r '.image.dockerImageMetadata.Architecture'
null

Additional info:

This was tested with OC command on x86_64

https://github.com/openshift/openshift-apiserver/pull/386

Bug OCPBUGS-23125: User can impersonate to all the user without the appropriate rolebinding

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13345

Bug OCPBUGS-19249: Update 4.15 ose-haproxy-router-base image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/router/pull/512

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/router/pull/512

Bug OCPBUGS-25982: [4.15] E2E Automation of Dynamic OVS Pinning

View the Description View the linked PRs

This is a clone of issue OCPBUGS-20368. The following is the description of the original issue:
—
Description of problem:

Automate E2E tests of Dynamic OVS Pinning. This bug is created for merging

https://github.com/openshift/cluster-node-tuning-operator/pull/746

Version-Release number of selected component (if applicable):

4.15.0

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-node-tuning-operator/pull/904

Bug OCPBUGS-18504: CAPI E2Es: missing ControlPlaneEndpoint field in AWSCluster

View the Description View the linked PRs

Description of problem:

Currently CAPI Cluster object always stays in `Provisioning` state.
This is because there is nothing that sets the ControlPlaneEndpoint field on the object.

Version-Release number of selected component (if applicable):

all

How reproducible:

Always

Steps to Reproduce:

1. Run E2Es
2. See that Cluster always stays in Provisioning state
3.

Actual results:

Cluster always stays in Provisioning state

Expected results:

Cluster should go into Provisioned state

Additional info:

As such we need to update the E2E tests and the objects creation scripts so that they set the ControlPlaneEndpoint before Cluster object creation, to make the Cluster go into Provisioned state.
This is a temporary workaround, as we expect the Cluster & InfrastructureCluster objects creation and the population of the ControlPlaneEndpoint is going to happen in a dedicated controller within the operator.

https://github.com/openshift/cluster-capi-operator/pull/126

Bug OCPBUGS-21635: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/node_exporter/pull/133

Bug OCPBUGS-25242: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/gcp-pd-csi-driver-operator/pull/103

Bug MGMT-15949: Assisted-service crashes when creating agentclusterinstall without imageSetRef

View the Description View the linked PRs

Description of the problem:

Whenever creating an AgentClusterInstall without an imageSetRef, the assisted-service container crashes due to attempting to access a nil pointer

How reproducible:

100%

Steps to reproduce:

1. Create and agentclusterinstall without an imageSetRef field

Actual results:

assisted-service container crashes

Expected results:

AgentClusterInstall updates with specsynced error or sufficient defaults.

Additional Information:

Seems to be due to the fact that there is no check if spec.ImageSetRef is nil in this function: https://github.com/openshift/assisted-service/blob/91fcb5bc822de96602657efd883ed419bbb64963/internal/controller/controllers/clusterdeployments_controller.go#L1439C3-L1439C3

https://github.com/openshift/assisted-service/pull/5552

Bug OCPBUGS-21771: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/prometheus-alertmanager/pull/79

Bug OCPBUGS-25978: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-powervs-block-csi-driver/pull/71

Bug OCPBUGS-25237: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-disk-csi-driver-operator/pull/115

Bug MGMT-15684: Custom Manifest - exception error for manifest containing space within name not aligned with other exceptions with invalid names

View the Description View the linked PRs

Description of the problem:
Minor Issue :

testing api functions to add manifest to cluster , noticed that for invalid file names we normally get

status=422,
reason="Unprocessable Entity",

however for the file name : "sp ce.yaml "
we get 400 not 422 , and general Bad Request entity

Reason: Bad Request
HTTP response headers: HTTPHeaderDict(

{'content-type': 'application/json', 'vary': 'Accept-Encoding,Origin', 'date': 'Thu, 31 Aug 2023 14:51:39 GMT', 'content-length': '177', 'x-envoy-upstream-service-time': '5', 'server': 'envoy', 'set-cookie': 'bd0de3dae0f495ebdb32e3693e2b9100=0f4b5982ace0eb64263ae6f95fd1452e; path=/; HttpOnly; Secure; SameSite=None'}

)
HTTP response body:

{"code":"400","href":"","id":400,"kind":"Error","reason":"Cluster manifest sp ce.yaml for cluster cbf119a6-29cc-4db8-aa76-65d2ca4b0a46 should not include a space in its name."}

I believe it is better to align this with the same exception we getting (for example when creating file with invalid file extension , or file name which already exist (422)

How reproducible:

Steps to reproduce:

1. try to create via api v2_create_cluster_manifest manifest with the name "sp ce.yaml"

Actual results:

getting 400 , Badrequest

Expected results:
422 , reason="Unprocessable Entity",

https://github.com/openshift/assisted-service/pull/5634

Bug OCPBUGS-16482: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-22204: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/baremetal-runtimecfg/pull/280

Task HOSTEDCP-1232: Prepare Hypershift for CAPI v1.5+

View the Description View the linked PRs

prepare Hypershift for the CAPI bump to v1.5.2 https://github.com/openshift/cluster-api/pull/181 so that hypershift-e2e can pass.

https://github.com/openshift/hypershift/pull/3087

Bug OCPBUGS-19193: Update 4.15 ose-kubevirt-cloud-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-kubevirt/pull/25

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-kubevirt/pull/25

Bug OCPBUGS-19263: Update 4.15 ose-cluster-kube-controller-manager-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-kube-controller-manager-operator/pull/747

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/747

Bug OCPBUGS-13829: tokenConfig's accessTokenInactivityTimeout fields doesn't work in hypershift guest cluster

View the Description View the linked PRs

Description of problem:

The configured accessTokenInactivityTimeout under tokenConfig in HostedCluster doesn't have any effect.
1. The value is not getting updated in oauth-openshift configmap 
2. hostedcluster allows user to set accessTokenInactivityTimeout value < 300s, where as in master cluster the value should be > 300s.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Always

Steps to Reproduce:

1. Install a fresh 4.13 hypershift cluster  
2. Configure accessTokenInactivityTimeout as below:
$ oc edit hc -n clusters
...
  spec:
    configuration:
      oauth:
        identityProviders:
        ...
        tokenConfig:          
          accessTokenInactivityTimeout: 100s
...
3. Check the hcp:
$ oc get hcp -oyaml
...
        tokenConfig:           
          accessTokenInactivityTimeout: 1m40s
...

4. Login to guest cluster with testuser-1 and get the token
$ oc login https://a8890bba21c9b48d4a05096eee8d4edd-738276775c71fb8f.elb.us-east-2.amazonaws.com:6443 -u testuser-1 -p xxxxxxx
$ TOKEN=`oc whoami -t`
$ oc login --token="$TOKEN"
WARNING: Using insecure TLS client config. Setting this option is not supported!
Logged into "https://a8890bba21c9b48d4a05096eee8d4edd-738276775c71fb8f.elb.us-east-2.amazonaws.com:6443" as "testuser-1" using the token provided.
You don't have any projects. You can try to create a new project, by running
    oc new-project <projectname>

Actual results:

1. hostedcluster will allow user to set the value < 300s for accessTokenInactivityTimeout which is not possible on master cluster.

2. The value is not updated in oauth-openshift configmap:
$ oc get cm oauth-openshift -oyaml -n clusters-hypershift-ci-25785 
...
      tokenConfig:
        accessTokenMaxAgeSeconds: 86400
        authorizeTokenMaxAgeSeconds: 300
...

3. Login doesn't fail even if the user is not active for more than the set accessTokenInactivityTimeout seconds.

Expected results:

Login fails if the user is not active within the accessTokenInactivityTimeout seconds.

https://github.com/openshift/hypershift/pull/3025

Bug OCPBUGS-19094: Update 4.15 ose-multus-admission-controller image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/multus-admission-controller/pull/69

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/multus-admission-controller/pull/69

Bug OCPBUGS-24120: Update 4.15 ose-azure-disk-csi-driver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/azure-disk-csi-driver/pull/64

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/azure-disk-csi-driver/pull/64

Bug OCPBUGS-24329: Update 4.15 ose-csi-snapshot-controller-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-snapshotter/pull/121

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-snapshotter/pull/121

Bug OCPBUGS-3356: HAproxy warning when httpCaptureCookies.maxLength exceeds 63 bytes

View the Description View the linked PRs

Description of problem:
IHAC with OCP 4.9 who has configured the IngressControllers with a long httpLogFormat, and the routers are printing every time it reloads

I0927 13:29:45.495077 1 router.go:612] template "msg"="router reloaded" "output"="[WARNING] 269/132945 (9167) : config : truncating capture length to 63 bytes for frontend 'public'.\n[WARNING] 269/132945 (9167) : config : truncating capture length to 63 bytes for frontend 'fe_sni'.\n[WARNING] 269/132945 (9167) : config : truncating capture length to 63 bytes for frontend 'fe_no_sni'.\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"

This is the Ingress Contoller configuration:

  logging:
    access:
      destination:
        syslog:
          address: 10.X.X.X
          port: 10514
        type: Syslog
      httpCaptureCookies:
      - matchType: Exact
        maxLength: 128
        name: ITXSESSIONID
      httpCaptureHeaders:
        request:
        - maxLength: 128
          name: Host
        - maxLength: 128
          name: itxrequestid
      httpLogFormat: actconn="%ac",backend_name="%b",backend_queue="%bq",backend_source_ip="%bi",backend_source_port="%bp",beconn="%bc",bytes_read="%B",bytes_uploaded="%U",captrd_req_cookie="%CC",captrd_req_headers="%hr",captrd_res_cookie="%CS",captrd_res_headers="%hs",client_ip="%ci",client_port="%cp",cluster="ieec1ocp1",datacenter="ieec1",environment="pro",fe_name_transport="%ft",feconn="%fc",frontend_name="%f",hostname="%H",http_version="%HV",log_type="http",method="%HM",query_string="%HQ",req_date="%tr",request="%HP",res_time="%TR",retries="%rc",server_ip="%si",server_name="%s",server_port="%sp",srv_queue="%sq",srv_conn="%sc",srv_queue="%sq",status_code="%ST",Ta="%Ta",Tc="%Tc",tenant="bk",term_state="%tsc",tot_wait_q="%Tw",Tr="%Tr"
      logEmptyRequests: Ignore

Any way to avoid this truncate warning?

How reproducible:
For every reload of haproxy config

Steps to Reproduce:
You can reproduce easily with the following configuration in the default ingress controller:

logging:
access:
destination:
type: Container
httpCaptureCookies:

matchType: Exact
maxLength: 128
name: _abck
And accessing from out console, you will get a log like:

2022-10-18T14:13:53.068164+00:00 xxxx xxxxxx haproxy[38]: 10.39.192.203:40698 [18/Oct/2022:14:13:52.488] fe_sni~ be_secure:openshift-console:console/pod:console-5976495467-zxgxr:console:https:10.128.1.116:8443 0/0/0/10/580 200 1130598 _abck=B7EA642C9E828FA8210F329F80B7B2D80YAAQnVozuFVfkOaDAQAADk - --VN 78/37/33/33/0 0/0 "GET /api/kubernetes/openapi/v2 HTTP/1.1"

https://github.com/openshift/cluster-ingress-operator/pull/871

Bug OCPBUGS-24215: kube-controller-manager TLS artifacts should have ownership annotations

View the linked PRs

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/769

Bug OCPBUGS-17060: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-image-registry-operator/pull/908

Bug OCPBUGS-18317: ovnkube-node requires namespaces/status permissions in interconnect

View the Description View the linked PRs

With IC ovnkube-node requires namespaces/status permissions.

After talking to Tim Rozet it seems that this is not necessary, we previously used that approach because ovnkube-node only listened for local pods it needs to know this information/event from a remote gateway pod. Now since ovnkube-node is watching all pods, it can just listen for the remote pod and then sync conntrack.

https://github.com/openshift/ovn-kubernetes/pull/1917

Bug OCPBUGS-24165: Update 4.15 ose-apiserver-network-proxy-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/apiserver-network-proxy/pull/44

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/apiserver-network-proxy/pull/44

Bug OCPBUGS-19288: Update 4.15 ovn-kubernetes-microshift image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ovn-kubernetes/pull/1883

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ovn-kubernetes/pull/1883

Bug OCPBUGS-21729: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1567

Bug OCPBUGS-24160: Update 4.15 ose-csi-snapshot-validation-webhook-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-snapshotter/pull/116

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-snapshotter/pull/116

Bug OCPBUGS-24124: Update 4.15 ose-alibaba-machine-controllers-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-alibaba/pull/46

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-alibaba/pull/47

Story MGMT-15405: Publish the static network download URL to the infraenv debug info

View the Description View the linked PRs

~~MGMT-11443~~ added an API for users to download the rendered nmconnection files used in the ISO, but when using the kube-api that URL isn't given to the user.

This should be added to the infrenv status in the debug info section

https://github.com/openshift/assisted-service/pull/5638

Bug OCPBUGS-16634: [OVN-Kubernetes] IP currently assigned to multiple pods

View the Description View the linked PRs

{  2023-07-19T16:52:37Z reason/ReusedPodIP podIP 10.128.0.39 is currently assigned to multiple pods: ns/e2e-replicaset-4951 pod/test-rs-ddhkn node/ip-10-0-151-233.us-west-1.compute.internal uid/117115dd-dc8f-4333-b972-ed880fcf8dd9;ns/openshift-apiserver pod/apiserver-5f7d4599b4-dvpdk node/ip-10-0-151-233.us-west-1.compute.internal uid/293cba9c-11ea-4258-9d38-4ff5b2cb52bd
2023-07-19T16:58:40Z reason/ReusedPodIP podIP 10.128.0.39 is currently assigned to multiple pods: ns/e2e-job-1076 pod/pod-disruption-failure-ignore-2-qlxp2 node/ip-10-0-151-233.us-west-1.compute.internal uid/3dda8eea-b221-433a-b254-fc7cf487189b;ns/openshift-apiserver pod/apiserver-5f7d4599b4-dvpdk node/ip-10-0-151-233.us-west-1.compute.internal uid/293cba9c-11ea-4258-9d38-4ff5b2cb52bd}

I0719 16:44:56.659916   49761 base_network_controller_pods.go:444] [default/openshift-apiserver/apiserver-5f7d4599b4-dvpdk] creating logical port openshift-apiserver_apiserver-5f7d4599b4-dvpdk for pod on switch ip-10-0-151-233.us-west-1.compute.internal

W0719 16:44:56.666407   49761 base_network_controller_pods.go:198] No cached port info for deleting pod default/openshift-kube-controller-manager/installer-7-ip-10-0-151-233.us-west-1.compute.internal. Using logical switch ip-10-0-151-233.us-west-1.compute.internal port uuid  and addrs [10.128.0.39/23]

I0719 16:44:56.680604   49761 base_network_controller_pods.go:234] Releasing IPs for Completed pod: openshift-kube-controller-manager/installer-7-ip-10-0-151-233.us-west-1.compute.internal, ips: 10.128.0.39

I0719 16:44:56.699279   49761 pods.go:134] Attempting to release IPs for pod: openshift-kube-controller-manager/installer-7-ip-10-0-151-233.us-west-1.compute.internal, ips: 10.128.0.39

I0719 16:44:56.790903   49761 client.go:783]  "msg"="transacting operations" "database"="OVN_Northbound" "operations"="[\{Op:insert Table:Logical_Switch_Port Row:map[addresses:{GoSet:[0a:58:0a:80:00:27 10.128.0.39]} external_ids:\{GoMap:map[namespace:openshift-apiserver pod:true]} name:openshift-apiserver_apiserver-5f7d4599b4-dvpdk

Observed in
https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-net[…]perator-master-e2e-aws-ovn-single-node/1681699276796727296

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_clus[…]netes_ovnkube-node-bsbt9_ovnkube-controller.log

https://github.com/openshift/ovn-kubernetes/pull/1942

Bug OCPBUGS-19092: Enable console on OCI

View the Description View the linked PRs

When creating an Agent ISO for OCI, we should add the kernel argument console=ttyS0 to the ISO/PXE kargs.

CoreOS does not include a console arg by default when using metal as the platform because different hardware has different consoles and specifying one can cause booting to fail on some, but it does on many cloud platforms. Since we know when the user is definitely using OCI (there are validations in assisted that ensure it) and we know the correct settings for OCI, we should set them up automatically.

https://github.com/openshift/installer/pull/7511

Story CCO-192: Make CCO use only Lease object for leader election

View the Description View the linked PRs

This needs to wait until 4.12 branches, which should be June 24 per https://lists.corp.redhat.com/archives/aos-hive/2022-April/000006.html

https://github.com/openshift/cloud-credential-operator/pull/627

Bug OCPBUGS-20096: pause image is still on RHEL 8

View the Description View the linked PRs

Description of problem:

Recently we bumped the hyperkube image [1] to use both RHEL 9 builder and base images.

In order to keep things consistent, we tried to do the same with the "pause" image [2], however, that caused mass failures in payload jobs [3] due to a mismatch with ART [4], which still builds that image with RHEL 8.

As a result, we decided to keep builder & base images for "pause" in RHEL 8, as this work was not required for the kube 1.28 bump nor the FIPS issue we were addressing.

However, for the sake of consistency, eventually it'd be good to bump the "pause" builder & base images to RHEL 9.

[1] https://github.com/openshift/kubernetes/blob/6ab54b8d9a0ea02856efd3835b6f9df5da9ce115/openshift-hack/images/hyperkube/Dockerfile.rhel#L1

[2] https://github.com/openshift/kubernetes/blob/6ab54b8d9a0ea02856efd3835b6f9df5da9ce115/build/pause/Dockerfile.Rhel#L1

[3] https://github.com/openshift/kubernetes/blob/6ab54b8d9a0ea02856efd3835b6f9df5da9ce115/build/pause/Dockerfile.Rhel#L1

[4] https://github.com/openshift-eng/ocp-build-data/blob/openshift-4.15/images/openshift-enterprise-pod.yml

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

Builder & base images for "pause" are RHEL 8.

Expected results:

Builder & base images for "pause" are RHEL 9.

Additional info:

https://github.com/openshift/kubernetes/pull/1734

Bug OCPBUGS-24014: Reduce shared informer mermory usage

View the Description View the linked PRs

Reduce shared informer memory usage by stripping object fields we don't care about.

https://github.com/openshift/ovn-kubernetes/pull/1962

Bug OCPBUGS-25690: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-version-operator/pull/1008

Bug OCPBUGS-22113: ARO builds should not generate azure-cloud-provider credentials in Manual mode

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

4.14.0 and 4.15.0

How reproducible:

Every time.

Steps to Reproduce:

1. git clone https://github.com/openshift/installer.git
2. export TAGS=aro
3. hack/build.sh
4. export OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE="${RELEASE_IMAGE}"
5. export OPENSHIFT_INSTALL_INVOKER="ARO"
6. Run ccoctl to generate ID resources
7. ./openshift-install create manifests
8. ./openshift-install create cluster --log-level=debug

Actual results:

azure-cloud-provider gets generated with aadClientId = service principal clientID used by the installer.

Expected results:

This step should be skipped and kube-controller-manager should rely on file assets.

Additional info:

Open pull request: https://github.com/openshift/installer/pull/7608

https://github.com/openshift/installer/pull/7608

Bug OCPBUGS-25227: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-ingress-operator/pull/1005

Bug OCPBUGS-26066: Regression: [sig-arch] events should not repeat pathologically for ns/openshift-operator-lifecycle-manager

View the Description View the linked PRs

This is a clone of issue OCPBUGS-25830. The following is the description of the original issue:
—
Component Readiness has found a potential regression in [sig-arch] events should not repeat pathologically for ns/openshift-operator-lifecycle-manager.

Probability of significant regression: 100.00%

Sample (being evaluated) Release: 4.15
Start Time: 2023-12-05T00:00:00Z
End Time: 2023-12-11T23:59:59Z
Success Rate: 94.30%
Successes: 248
Failures: 15
Flakes: 0

Base (historical) Release: 4.14
Start Time: 2023-10-04T00:00:00Z
End Time: 2023-10-31T23:59:59Z
Success Rate: 100.00%
Successes: 730
Failures: 0
Flakes: 0

View the test details report at https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&baseEndTime=2023-10-31%2023%3A59%3A59&baseRelease=4.14&baseStartTime=2023-10-04%2000%3A00%3A00&capability=Other&component=OLM&confidence=95&environment=ovn%20upgrade-minor%20amd64%20aws%20standard&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&pity=5&platform=aws&sampleEndTime=2023-12-11%2023%3A59%3A59&sampleRelease=4.15&sampleStartTime=2023-12-05%2000%3A00%3A00&testId=openshift-tests-upgrade%3A480dd81bbb3ca53f8daa59222281fea8&testName=%5Bsig-arch%5D%20events%20should%20not%20repeat%20pathologically%20for%20ns%2Fopenshift-operator-lifecycle-manager&upgrade=upgrade-minor&variant=standard

https://github.com/openshift/operator-framework-olm/pull/648

Bug OCPBUGS-24141: Update 4.15 ose-vmware-vsphere-csi-driver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/vmware-vsphere-csi-driver/pull/100

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/vmware-vsphere-csi-driver/pull/100

Bug OCPBUGS-24168: Update 4.15 ose-machine-api-provider-azure-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-azure/pull/86

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-azure/pull/86

Bug OCPBUGS-16788: The file permissions of /var/lib/cni/networks/openshift-sdn in all sdn pods should be updated to 600 to conform with CIS benchmarks

View the Description View the linked PRs

Description of problem:

Observation from CISv1.4 pdf:
1.1.9 Ensure that the Container Network Interface file permissions are set to 600 or more restrictive

"Container Network Interface provides various networking options for overlay networking.
You should consult their documentation and restrict their respective file permissions to maintain the integrity of those files. Those files should be writable by only the administrators on the system."
 
To conform with CIS benchmarksChange, the /var/lib/cni/networks/openshift-sdn files in all sdn pods should be updated to 600.
$ for i in $(oc get pods -n openshift-sdn -l app=sdn -oname); do oc exec -n openshift-sdn $i -- find /var/lib/cni/networks/openshift-sdn -type f -exec stat -c %a {} \;; done
Defaulted container "sdn" out of: sdn, kube-rbac-proxy
644
644
644
644
644
644
644
644
644
644
644
644
644
Defaulted container "sdn" out of: sdn, kube-rbac-proxy
644
644
644
644
644
644
644
644
644
644
644
644
644
Defaulted container "sdn" out of: sdn, kube-rbac-proxy
644
644
644
644
644
644
644
644
644
644
644
644
Defaulted container "sdn" out of: sdn, kube-rbac-proxy
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
Defaulted container "sdn" out of: sdn, kube-rbac-proxy
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
Defaulted container "sdn" out of: sdn, kube-rbac-proxy
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-20-215234

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

The file permissions for /var/lib/cni/networks/openshift-sdn files in all sdn pods is 644

Expected results:

The file permissions for /var/lib/cni/networks/openshift-sdn files in all sdn pods should be updated to 600

Additional info:

https://github.com/openshift/sdn/pull/584

Bug OCPBUGS-24067: Update 4.15 golang-github-openshift-oauth-proxy-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/oauth-proxy/pull/269

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/oauth-proxy/pull/269

Bug OCPBUGS-24170: Update 4.15 ose-powervs-cloud-controller-manager-container image to be consistent with ART

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cloud-provider-powervs/pull/61

Bug OCPBUGS-18267: 404: not found will shonw on Knative-serving Details page

View the Description View the linked PRs

Description of problem:

'404: Not Found' will show on Knative-serving Details page

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-06-13-223353

How reproducible:

Always

Steps to Reproduce:

1. Installed 'Serveless' Operator, make sure the operator has been installed successfully, and the Knative Serving instance is created without any error
2. Navigate to Administration -> Cluster Settings -> Global Configuration
3. Go to Knative-serving Details page, check if 404 not found message is there
3.

Actual results:

Page will show 404 not found

Expected results:

the 404 not found page should not show

Additional info:

the dependency ticket is OCPBUGs-15008, more information could be checked in the comment

https://github.com/openshift/console/pull/13156

Bug OCPBUGS-24252: [UI] ODF installation console in an AWS STS enabled cluster has issues

View the Description View the linked PRs

Openshift data foundation installation wizard will be having option to enter role arn details in an AWS STS enabled OCP cluster. But this particular field is not letting to enter any values, the moment we type anything it got auto populated with [object Object] and after that we cant add or paste anything to it.

Tried to inspect the page and add element and on pressing install button. It throws below error:

An error occurred

Converting circular structure to JSON --> starting at object with constructor 'HTMLInputElement' | property '__reactFiber$rrh47yimfa' -> object with constructor 'Lu' — property 'stateNode' closes the circle

https://github.com/openshift/console/pull/13416

Bug OCPBUGS-8764: [IPI Baremetal] The host doesn't power off upon removal during scale down.

View the Description View the linked PRs

The host doesn't power off upon removal during scale down.

Version: 4.4.0-0.nightly-2020-01-09-013524

Steps to reproduce:

Starting with 3 workers:
[kni@worker-2 ~]$ oc get bmh -n openshift-machine-api
NAME STATUS PROVISIONING STATUS CONSUMER BMC HARDWARE PROFILE ONLINE ERROR
openshift-master-0 OK externally provisioned ocp-edge-cluster-master-0 ipmi://192.168.123.1:6230 true
openshift-master-1 OK externally provisioned ocp-edge-cluster-master-1 ipmi://192.168.123.1:6231 true
openshift-master-2 OK externally provisioned ocp-edge-cluster-master-2 ipmi://192.168.123.1:6232 true
openshift-worker-0 OK provisioned ocp-edge-cluster-worker-0-d2fvm ipmi://192.168.123.1:6233 unknown true
openshift-worker-5 OK provisioned ocp-edge-cluster-worker-0-ptklp ipmi://192.168.123.1:6245 unknown true
openshift-worker-9 OK provisioned ocp-edge-cluster-worker-0-jb2tm ipmi://192.168.123.1:6239 unknown true

[kni@worker-2 ~]$ oc get machine -n openshift-machine-api
NAME PHASE TYPE REGION ZONE AGE
ocp-edge-cluster-master-0 4d4h
ocp-edge-cluster-master-1 4d4h
ocp-edge-cluster-master-2 4d4h
ocp-edge-cluster-worker-0-d2fvm 146m
ocp-edge-cluster-worker-0-jb2tm 11m
ocp-edge-cluster-worker-0-ptklp 3h54m

[kni@worker-2 ~]$ oc get node
NAME STATUS ROLES AGE VERSION
master-0 Ready master 4d4h v0.0.0-master+$Format:%h$
master-1 Ready master 4d4h v0.0.0-master+$Format:%h$
master-2 Ready master 4d4h v0.0.0-master+$Format:%h$
worker-0 Ready worker 18m v0.0.0-master+$Format:%h$
worker-5 Ready worker 18m v0.0.0-master+$Format:%h$
worker-9 Ready worker 5m2s v0.0.0-master+$Format:%h$

adding annotation to mark the proper node for deletion:
oc annotate machine ocp-edge-cluster-worker-0-jb2tm machine.openshift.io/cluster-api-delete-machine=yes -n openshift-machine-api
machine.machine.openshift.io/ocp-edge-cluster-worker-0-jb2tm annotated

Deleting the bmh:
[kni@worker-2 ~]$ oc delete bmh openshift-worker-9 -n openshift-machine-api
baremetalhost.metal3.io "openshift-worker-9" deleted

Scaling down the replicas number:
[kni@worker-2 ~]$ oc scale machineset -n openshift-machine-api ocp-edge-cluster-worker-0 --replicas=2
machineset.machine.openshift.io/ocp-edge-cluster-worker-0 scaled

The entry (worker-9) got removed as expected:
[kni@worker-2 ~]$ oc get node
NAME STATUS ROLES AGE VERSION
master-0 Ready master 4d4h v0.0.0-master+$Format:%h$
master-1 Ready master 4d4h v0.0.0-master+$Format:%h$
master-2 Ready master 4d4h v0.0.0-master+$Format:%h$
worker-0 Ready worker 28m v0.0.0-master+$Format:%h$
worker-5 Ready worker 28m v0.0.0-master+$Format:%h$
[kni@worker-2 ~]$ oc get machine -n openshift-machine-api
NAME PHASE TYPE REGION ZONE AGE
ocp-edge-cluster-master-0 4d4h
ocp-edge-cluster-master-1 4d4h
ocp-edge-cluster-master-2 4d4h
ocp-edge-cluster-worker-0-d2fvm 156m
ocp-edge-cluster-worker-0-ptklp 4h5m

[kni@worker-2 ~]$ oc get bmh -n openshift-machine-api
NAME STATUS PROVISIONING STATUS CONSUMER BMC HARDWARE PROFILE ONLINE ERROR
openshift-master-0 OK externally provisioned ocp-edge-cluster-master-0 ipmi://192.168.123.1:6230 true
openshift-master-1 OK externally provisioned ocp-edge-cluster-master-1 ipmi://192.168.123.1:6231 true
openshift-master-2 OK externally provisioned ocp-edge-cluster-master-2 ipmi://192.168.123.1:6232 true
openshift-worker-0 OK provisioned ocp-edge-cluster-worker-0-d2fvm ipmi://192.168.123.1:6233 unknown true
openshift-worker-5 OK provisioned ocp-edge-cluster-worker-0-ptklp ipmi://192.168.123.1:6245 unknown true

Yet, if I try to connect to the node that got deleted - it's still UP and running.

Expected result:
The removed node should have been powered off automatically.

https://github.com/openshift/baremetal-operator/pull/315

Bug OCPBUGS-15253: Add namespace to IngressWithoutClassName and UnmanagedRoutes alert message

View the Description View the linked PRs

Description of problem:

It would help making debugging easier if we included the namespace in the message for these alerts: https://github.com/openshift/cluster-ingress-operator/blob/master/manifests/0000_90_ingress-operator_03_prometheusrules.yaml#L69

Version-Release number of selected component (if applicable):

4.12.x

How reproducible:

Always

Steps to Reproduce:

1. 
2.
3.

Actual results:

No namespace in the alert message

Expected results:

Additional info:

https://github.com/openshift/route-controller-manager/pull/35

Bug OCPBUGS-22924: e2e-ibmcloud-csi is failing too much

View the Description View the linked PRs

In our CI, pre-submit jobs for IBM VPC CSI driver and its operator are failing with:

[sig-arch] events should not repeat pathologically for ns/openshift-cluster-csi-drivers expand_less0s{  16 events happened too frequently

event happened 25 times, something is wrong: ns/openshift-cluster-csi-drivers pod/ibm-vpc-block-csi-node-vck82 node/ci-op-jsqf19qs-00b5a-mjg8w-master-1 hmsg/99d84ba4c3 - pathological/true reason/FailedToRetrieveImagePullSecret Unable to retrieve some image pull secrets (bluemix-default-secret, bluemix-default-secret-regional, bluemix-default-secret-international, icr-io-secret); attempting to pull the image may not succeed. From: 06:44:57Z To: 06:44:58Z result=reject

Example:

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_ibm-vpc-block-csi-driver-operator/88/pull-ci-openshift-ibm-vpc-block-csi-driver-operator-master-e2e-ibmcloud-csi/1720315305915322368

Operator CI:

https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-ibm-vpc-block-csi-driver-operator-master-e2e-ibmcloud-csi

Driver CI:

https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-ibm-vpc-block-csi-driver-master-e2e-ibmcloud-csi

The driver itself looks working, so it's probably just a transient, but annoying error.

https://github.com/openshift/ibm-vpc-block-csi-driver-operator/pull/87

Bug OCPBUGS-24171: Update 4.15 csi-driver-manila-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-openstack/pull/250

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-openstack/pull/250

Bug OCPBUGS-24276: include network-tools in pre-dispatch script

View the Description View the linked PRs

In the python script used during bug pre-dispatch, include "networking / network-tools" component.

https://github.com/openshift/network-tools/pull/102

Bug OCPBUGS-3680: 4.15: Upgrade blocked: csi-snapshot-controller fails with read-only filesystem

View the Description View the linked PRs

Description of problem:

OCP upgrade blocks because of cluster operator csi-snapshot-controller fails to start its deployment with a fatal message of read-only filesystem

Version-Release number of selected component (if applicable):

Red Hat OpenShift 4.11
rhacs-operator.v3.72.1

How reproducible:

At least once in user's cluster while upgrading

Steps to Reproduce:

1. Have a OCP 4.11 installed
2. Install ACS on top of the OCP cluster
3. Upgrade OCP to the next z-stream version

Actual results:

Upgrade gets blocked: waiting on csi-snapshot-controller

Expected results:

Upgrade should succeed

Additional info:

stackrox SCCs (stackrox-admission-control, stackrox-collector and stackrox-sensor) contain the `readOnlyRootFilesystem` set to `true`, if not explicitly defined/requested, other Pods might receive this SCC which will make the deployment to fail with a `read-only filesystem` message

https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/159

Bug OCPBUGS-22489: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7643

Bug OCPBUGS-17287: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-capi-operator/pull/132

Bug OCPBUGS-19411: cluster-autoscaler-operator clusterrole needs watch on clusteroperators

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.oc -n openshift-machine-api get role/cluster-autoscaler-operator -o yaml
2. Observe missing watch verb
3. Tail cluster-autoscaler logs to see error

status.go:444] No ClusterAutoscaler. Reporting available.
I0919 16:40:52.877216       1 status.go:244] Operator status available: at version 4.14.0-rc.1
E0919 16:40:53.719592       1 reflector.go:148] github.com/openshift/client-go/config/informers/externalversions/factory.go:101: Failed to watch *v1.ClusterOperator: unknown (get clusteroperators.config.openshift.io)

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-autoscaler-operator/pull/287

Bug OCPBUGS-21735: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openshift-apiserver/pull/396

Bug OCPBUGS-17288: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/248

Bug OCPBUGS-19289: Update 4.15 ose-ovn-kubernetes image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ovn-kubernetes/pull/1884

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ovn-kubernetes/pull/1884

Bug OCPBUGS-19823: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-service/pull/5565

Bug OCPBUGS-19900: hybrid nodes have permissions error setting annotations

View the Description View the linked PRs

Description of problem: Updating the ovn-kubernetes submodules in the windows-machine-config-operator causes nodes to have permission errors setting annotations

E0927 19:37:53.178022    4932 kube.go:130] Error in setting annotation on node ci-op-56c3qr7h-8411c-wdmq9-e2e-wm-xs6sc: admission webhook "node.network-node-identity.openshift.io" denied the request: user "system:node:ci-op-56c3qr7h-8411c-wdmq9-e2e-wm-xs6sc" is not allowed to set the following annotations on node: "ci-op-56c3qr7h-8411c-wdmq9-e2e-wm-xs6sc": [k8s.ovn.org/hybrid-overlay-distributed-router-gateway-mac]

seen in
https://github.com/openshift/windows-machine-config-operator/pull/1836

https://github.com/openshift/ovn-kubernetes/pull/1919

Story CCO-437: Document conversion from passthrough to manual Azure AD Workload Identity credentials

View the linked PRs

https://github.com/openshift/cloud-credential-operator/pull/598

Bug OCPBUGS-20440: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7582

Bug OCPBUGS-19054: No warning that TechPreview is not supported by agent installer

View the Description View the linked PRs

Description of problem:

The agent-based installer does not support the TechPreviewNoUpgrade featureSet, and by extension nor does it support any of the features gated by it. Because of this, there is no warning about one of these features being specified - we expect the TechPreviewNoUpgrade feature gate to error out when any of them are used.

However, we don't warn about TechPreviewNoUpgrade itself being ignored, so if the user does specify it then they can use some of these non-supported features without being warned that their configuration is ignored.

We should fail with an error when TechPreviewNoUpgrade is specified, until such time as AGENT-554 is implemented.

https://github.com/openshift/installer/pull/7825

Bug OCPBUGS-25701: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/1994

Story TRT-1377: nightly-4.15-e2e-metal-ipi-sdn-bm failing to bootstrap affecting nightly payloads

View the Description View the linked PRs

Metal team has filed: ~~OCPBUGS-24328~~

Seems to be permafailing for several days now. First payload https://amd64.ocp.releases.ci.openshift.org/releasestream/4.15.0-0.nightly/release/4.15.0-0.nightly-2023-11-30-112918

Failure to bootstrap is quite hard to decipher for us.

https://github.com/openshift/machine-config-operator/pull/4053

Bug OCPBUGS-18390: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/2081

Bug OCPBUGS-25643: Alert, Metrics page not loading in OCP Console

View the Description View the linked PRs

This is a clone of issue OCPBUGS-25313. The following is the description of the original issue:
—
Description of problem:

Unable to view the alerts, metrics page, getting a blank page.

Version-Release number of selected component (if applicable):

4.15.0-nightly

How reproducible:

Always

Steps to Reproduce:

Click on any alert under "Notification Panel" to view more, and you will be redirected to the alert page.

Actual results:

User is unable to view any alerts, metrics.

Expected results:

User should be able to view all/individual alerts, metrics.

Additional info:

N.A

https://github.com/openshift/monitoring-plugin/pull/89

Bug OCPBUGS-19103: Update 4.15 ose-sdn image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/sdn/pull/574

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/sdn/pull/574

Bug OCPBUGS-21645: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api/pull/182

Bug OCPBUGS-21829: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-24172: Update 4.15 kube-proxy-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/sdn/pull/592

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/sdn/pull/592

Bug OCPBUGS-25346: Unable to use oc-mirror on RHEL9 Host with FIPS enabled OCP cluster

View the Description View the linked PRs

This is a clone of issue OCPBUGS-23550. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/oc-mirror/pull/764

Bug OCPBUGS-25824: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/aws-ebs-csi-driver/pull/253

Bug OCPBUGS-15817: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2077

Bug OCPBUGS-26480: GCP CCM credentials should be granular

View the Description View the linked PRs

Description of problem:

GCP CCM should be using granular permissions rather then pre-defined roles.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/320

Bug OCPBUGS-27101: [regression] increased etcd leader elections significantly impacting vsphere amd64 platform

View the Description View the linked PRs

This is a clone of issue OCPBUGS-27094. The following is the description of the original issue:
—
Description of problem:

Based on this and this component readiness data that compares success rates for those two particular tests, we are regressing ~7-10% between the current 4.15 master and 4.14.z (iow. we made the product ~10% worse).

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-vsphere-ovn-upi-serial/1720630313664647168

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-vsphere-ovn-serial/1719915053026643968

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-vsphere-ovn-upi-serial/1721475601161785344

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-vsphere-ovn-serial/1724202075631390720

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-vsphere-ovn-upi-serial/1721927613917696000

These jobs and their failures are all caused by increased etcd leader elections disrupting seemingly unrelated test cases across the VSphere AMD64 platform.

Since this particular platform's business significance is high, I'm setting this as "Critical" severity.

Please get in touch with me or Dean West if more teams need to be pulled into investigation and mitigation.

Version-Release number of selected component (if applicable):

4.15 / master

How reproducible:

Component Readiness Board

Actual results:

The etcd leader elections are elevated. Some jobs indicate it is due to disk i/o throughput OR network overload.

Expected results:

1. We NEED to understand what is causing this problem.
2. If we can mitigate this, we should.
3. If we cannot mitigate this, we need to document this or work with VSphere infrastructure provider to fix this problem.
4. We optionally need a way to measure how often this happens in our fleet so we can evaluate how bad it is.

Additional info:

Bug OCPBUGS-24747: Update 4.16 ironic-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ironic-image/pull/438

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ironic-image/pull/438

Bug OCPBUGS-26041: [release-4.15] There is no response when clicking on button "Select a version" when there is new update

View the Description View the linked PRs

This is a clone of issue OCPBUGS-25780. The following is the description of the original issue:
—
Description of problem:

When there is new update for cluster, try to click "Select a version" from cluster settings page, there is no reaction.

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-12-19-033450

How reproducible:

Always

Steps to Reproduce:

    1.Prepare a cluster with available update.
    2.Go to Cluster Settings page, choose a version by clicking on "Select a version" button.
    3.

Actual results:

2. There is no response when click on the button, user could not select a version from the page.

Expected results:

2. A modal should show up for user to select version after clicking on "Select a version" button

Additional info:

screenshot: https://drive.google.com/file/d/1Kpyu0kUKFEQczc5NVEcQFbf_uly_S60Y/view?usp=sharing

https://github.com/openshift/console/pull/13479

Bug OCPBUGS-19149: Update 4.15 ose-baremetal-installer image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/installer/pull/7494

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/installer/pull/7494

Bug OCPBUGS-20213: [azure-stack-upi] worker nodes are not added into public lb backendpool

View the Description View the linked PRs

Description of problem:

Install 4.14 UPI cluster on azure stack hub, console could not be accessed outside cluster.

$ curl -L -k https://console-openshift-console.apps.jimawwt.installer.redhat.wwtatc.com -vv
*   Trying 10.255.96.76:443...
* connect to 10.255.96.76 port 443 failed: Connection timed out
* Failed to connect to console-openshift-console.apps.jimawwt.installer.redhat.wwtatc.com port 443: Connection timed out
* Closing connection 0
curl: (28) Failed to connect to console-openshift-console.apps.jimawwt.installer.redhat.wwtatc.com port 443: Connection timed out


Worker nodes are missing in public lb backend pool
$ az network lb address-pool list --lb-name jimawwt-jhvtn -g jimawwt-jhvtn-rg
[
  {
    "backendIPConfigurations": [
      {
        "id": "/subscriptions/de7e09c3-b59a-4c7d-9c77-439c11b92879/resourceGroups/jimawwt-jhvtn-rg/providers/Microsoft.Network/networkInterfaces/jimawwt-jhvtn-master-1-nic/ipConfigurations/pipConfig",
        "resourceGroup": "jimawwt-jhvtn-rg"
      },
      {
        "id": "/subscriptions/de7e09c3-b59a-4c7d-9c77-439c11b92879/resourceGroups/jimawwt-jhvtn-rg/providers/Microsoft.Network/networkInterfaces/jimawwt-jhvtn-master-0-nic/ipConfigurations/pipConfig",
        "resourceGroup": "jimawwt-jhvtn-rg"
      },
      {
        "id": "/subscriptions/de7e09c3-b59a-4c7d-9c77-439c11b92879/resourceGroups/jimawwt-jhvtn-rg/providers/Microsoft.Network/networkInterfaces/jimawwt-jhvtn-master-2-nic/ipConfigurations/pipConfig",
        "resourceGroup": "jimawwt-jhvtn-rg"
      }
    ],
    "etag": "W/\"7a9d24a2-ff06-4108-9aac-a277595792e3\"",
    "id": "/subscriptions/de7e09c3-b59a-4c7d-9c77-439c11b92879/resourceGroups/jimawwt-jhvtn-rg/providers/Microsoft.Network/loadBalancers/jimawwt-jhvtn/backendAddressPools/jimawwt-jhvtn",
    "loadBalancingRules": [
      {
        "id": "/subscriptions/de7e09c3-b59a-4c7d-9c77-439c11b92879/resourceGroups/jimawwt-jhvtn-rg/providers/Microsoft.Network/loadBalancers/jimawwt-jhvtn/loadBalancingRules/api-public",
        "resourceGroup": "jimawwt-jhvtn-rg"
      },
      {
        "id": "/subscriptions/de7e09c3-b59a-4c7d-9c77-439c11b92879/resourceGroups/jimawwt-jhvtn-rg/providers/Microsoft.Network/loadBalancers/jimawwt-jhvtn/loadBalancingRules/a1a1c7bfe78c14a41a9149d42d698824-TCP-80",
        "resourceGroup": "jimawwt-jhvtn-rg"
      },
      {
        "id": "/subscriptions/de7e09c3-b59a-4c7d-9c77-439c11b92879/resourceGroups/jimawwt-jhvtn-rg/providers/Microsoft.Network/loadBalancers/jimawwt-jhvtn/loadBalancingRules/a1a1c7bfe78c14a41a9149d42d698824-TCP-443",
        "resourceGroup": "jimawwt-jhvtn-rg"
      }
    ],
    "name": "jimawwt-jhvtn",
    "provisioningState": "Succeeded",
    "resourceGroup": "jimawwt-jhvtn-rg"
  }
]

Similar bug OCPBUGS-14762 detected on Azure UPI. On installer side, we checked that public lb name and backendpool name for UPI are the same as ASH IPI.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-06-234925

How reproducible:

Always when installing Azure Stack UPI on 4.14

Steps to Reproduce:

1. Install UPI on Azure Stack Hub on 4.14
2.
3.

Actual results:

Worker nodes are missing in public lb backendpool

Expected results:

worker nodes are added into public lb backendpool and application can be accessed outside cluster

Additional info:

Issue is only detected on 4.14 azure stack hub UPI.
It works on ASH IPI and 4.13/4.12 ASH UPI.

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/291

Bug OCPBUGS-23786: After PatternFly5 update: Snippets in Quick starts aren't readable in dark mode

View the Description View the linked PRs

Issue 57 from https://docs.google.com/spreadsheets/d/1TR3ENY-GE_LQL9F-xH6NahtRHdu6IrYGhLqDDA8y0EI/edit#gid=1035185624

Open the "Add Helm Chart Repositories to extend the Developer Catalog for your project" quick start. Go to the next step. You will see a code sample that does not have the right style if you've enabled dark theme.

Note: Could we check if we can also update the PatternFly quickstart extension??

Screenshot: https://drive.google.com/file/d/1hxh5VI2S7jLKRdNlDQsdlAXL_G7TxtME/view?usp=sharing

https://github.com/openshift/console/pull/13366

Bug OCPBUGS-19918: when disabling ipsec, ds pods are deleted

View the Description View the linked PRs

Description of problem:

Issue was found when analyzing  bug https://issues.redhat.com/browse/OCPBUGS-19817

Version-Release number of selected component (if applicable):

4.15.0-0.ci-2023-09-25-165744

How reproducible:

everytime

Steps to Reproduce:

The cluster is ipsec cluster and enabled NS extension and ipsec service.
1.  enable e-w ipsec & wait for cluster to settle
2.  disable ipsec & wait for cluster to settle

you'll observer ipsec pods are deleted

Actual results:

no pods

Expected results:

pods should stay
see https://github.com/openshift/cluster-network-operator/blob/master/pkg/network/ovn_kubernetes.go#L314
	// If IPsec is enabled for the first time, we start the daemonset. If it is
	// disabled after that, we do not stop the daemonset but only stop IPsec.
	//
	// TODO: We need to do this as, by default, we maintain IPsec state on the
	// node in order to maintain encrypted connectivity in the case of upgrades.
	// If we only unrender the IPsec daemonset, we will be unable to cleanup
	// the IPsec state on the node and the traffic will continue to be
	// encrypted.

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2042

Bug OCPBUGS-24106: Update 4.15 ose-cluster-kube-scheduler-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-kube-scheduler-operator/pull/513

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-kube-scheduler-operator/pull/513

Bug OCPBUGS-25802: olm-operator pod always restart due to "detected that every object is labelled, exiting to re-start the process..." when upgrading OCP to 4.15 from 4.14.6

View the Description View the linked PRs

This is a clone of issue OCPBUGS-25448. The following is the description of the original issue:
—
Description of problem:

When upgrading OCP 4.14.6 to 4.15.0-0.nightly-2023-12-13-032512, olm-operator pod always restarts, which blocks the cluster upgrading.

MacBook-Pro:~ jianzhang$ omg get clusterversion 
2023-12-15 16:24:34.977 | WARNING  | omg.utils.load_yaml:<module>:10 - yaml.CSafeLoader failed to load, using SafeLoader
NAME     VERSION  AVAILABLE  PROGRESSING  SINCE  STATUS
version  4.14.6   True       True         4h47m  Working towards 4.15.0-0.nightly-2023-12-13-032512: 701 of 873 done (80% complete), waiting on operator-lifecycle-manager

MacBook-Pro:~ jianzhang$ omg get pods 
2023-12-15 16:47:36.383 | WARNING  | omg.utils.load_yaml:<module>:10 - yaml.CSafeLoader failed to load, using SafeLoader
NAME                                     READY  STATUS     RESTARTS  AGE
catalog-operator-564b666f96-6nmq8        1/1    Running    1         1h59m
collect-profiles-28375140-n9f2p          0/1    Succeeded  0         42m
collect-profiles-28375155-sf2qj          0/1    Succeeded  0         27m
collect-profiles-28375170-xkbxf          0/1    Succeeded  0         12m
olm-operator-6bfd5f76bc-xb5lk            0/1    Running    27        1h59m
package-server-manager-5b7969559f-68nn7  2/2    Running    0         1h59m
packageserver-5ffcb95bff-fvvpx           1/1    Running    0         1h58m
packageserver-5ffcb95bff-hgvxt           1/1    Running    0         1h58m

MacBook-Pro:~ jianzhang$ omg logs olm-operator-6bfd5f76bc-xb5lk --previous
2023-12-15 16:23:02.300 | WARNING  | omg.utils.load_yaml:<module>:10 - yaml.CSafeLoader failed to load, using SafeLoader
2023-12-13T23:38:05.452697228Z time="2023-12-13T23:38:05Z" level=info msg="log level info"
2023-12-13T23:38:05.452950096Z time="2023-12-13T23:38:05Z" level=info msg="TLS keys set, using https for metrics"
2023-12-13T23:38:05.515929950Z time="2023-12-13T23:38:05Z" level=info msg="found nonconforming items" gvr="rbac.authorization.k8s.io/v1, Resource=rolebindings" nonconforming=1
2023-12-13T23:38:05.588194624Z time="2023-12-13T23:38:05Z" level=info msg="found nonconforming items" gvr="/v1, Resource=services" nonconforming=1
2023-12-13T23:38:06.116654658Z time="2023-12-13T23:38:06Z" level=info msg="detected ability to filter informers" canFilter=false
2023-12-13T23:38:06.118496116Z time="2023-12-13T23:38:06Z" level=info msg="registering labeller" gvr="apps/v1, Resource=deployments" index=0
...
...
2023-12-13T23:38:06.381370939Z time="2023-12-13T23:38:06Z" level=info msg="labeller complete" gvr="rbac.authorization.k8s.io/v1, Resource=clusterrolebindings" index=0
2023-12-13T23:38:06.381424190Z time="2023-12-13T23:38:06Z" level=info msg="starting clusteroperator monitor loop" monitor=clusteroperator
2023-12-13T23:38:06.381467749Z time="2023-12-13T23:38:06Z" level=info msg="detected that every object is labelled, exiting to re-start the process..."

Version-Release number of selected component (if applicable):

MacBook-Pro:~ jianzhang$ oc adm release info --commits registry.ci.openshift.org/ocp/release:4.15.0-0.nightly-2023-12-13-032512 |grep olm 
  operator-lifecycle-manager                     https://github.com/openshift/operator-framework-olm                         b4d2b70c34e9654afe30cf724f1dc85a1ce5c683
  operator-registry                              https://github.com/openshift/operator-framework-olm                         b4d2b70c34e9654afe30cf724f1dc85a1ce5c683

How reproducible:

 always

Steps to Reproduce:

1, rerun this prow job: https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-4.15-upgrade-from-stable-4.14-ibmcloud-ipi-f28/

Actual results:

    Cluster failed to upgrade due to olm pods crash.

Expected results:

    Cluster upgraded successfully.

Additional info:

Must gather log in https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-4.15-upgrade-from-stable-4.14-ibmcloud-ipi-f28/1734995337258471424/artifacts/ibmcloud-ipi-f28/gather-must-gather/artifacts/

https://github.com/openshift/operator-framework-olm/pull/643

Bug OCPBUGS-14322: Excessive permissions in web-console impersonating a user

View the Description View the linked PRs

Description of problem:

Excessive permissions in web-console impersonating a user

Version-Release number of selected component (if applicable):

4.10.55

How reproducible:

 when trying to impersonate a specific user ('99GU8710') in an OCP 4.10.55 cluster, we are able to see pods and logs in web console and that user is unable to access these things using the command line.

Steps to Reproduce:

1. Create a user with LDAP (example: new_user)
2. Don't give user access to check pod logs for openhshift related namespaces ( For example: new_user should not be able to see pod logs for openhsift-apiserver)
3. Try to impersonate the user (new_user)
4. Try to check openshift-apiserver pod logs through command line( you will be able to see those)
5. Try to check the same logs from command line for new_user , you won't be able to see it.

Actual results:

`Impersonate the user` feature doesn't give correct validation

Expected results:

We should not be able to see pod logs if user does not have permission

Additional info:

https://github.com/openshift/console/pull/13196

Bug OCPBUGS-19492: Keepalived on bootstrap doesn't start due to missing configuration

View the Description View the linked PRs

Description of problem:

Keepalived constantly fails on bootstrap causing installation failure

Seems like it doesn't have keepalived.conf file and keepalived monitor fails on
Version-Release number of selected component (if applicable):

4.13.12

How reproducible:

Regular installation through assisted installer

Steps to Reproduce:

1.
2.
3.

Actual results:

keepalived fails to start

Expected results:

Success

Additional info:
*

https://github.com/openshift/baremetal-runtimecfg/pull/276

Bug OCPBUGS-18850: Update 4.15 golang-github-prometheus-node_exporter image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/node_exporter/pull/131

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/node_exporter/pull/131

Bug OCPBUGS-24072: Update 4.15 ose-aws-ebs-csi-driver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/aws-ebs-csi-driver/pull/245

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/aws-ebs-csi-driver/pull/245

Story OCPCLOUD-2167: CPMS tests should cover unhealthy node cases

View the Description View the linked PRs

User Story

As a developer of CPMS I want to ensure unhealthy nodes can be replaced so that we can recommend to users to use CPMS

Background

QE have some manual test cases that test a couple of unhappy scenarios for the CPMS, that should result in automatic recovery.

I would like to see these automated as part of the periodic suite for CPMS.

The behaviour itself isn't really dependent on CPMS, but, the whole workflow is.
The behaviour is primarily based on other components and how they react, but block CPMS from operating as expected.

The two cases I would like to see added are:

Terminate an instance on the cloud provider
- Once terminated, the node object should get removed
- Once the node object is removed, the machine should enter a failed state
- Terminate the Machine
- Eventually a new Machine comes up
- Eventually the old Machine goes away
- Eventually the cluster stabilises
Terminate the kubelet on the node
- SSH to the node and terminate kubelet
- Eventually the node will go into unready (condition)
- Delete the Machine object (MHC would do this in the real world)
- Eventually a new Machine becomes ready
- Eventually the old Machine goes away
- Eventually the cluster stabilises

Steps

Review the previous bug and Daniel's work to understand what got broken
Understand how to terminate an instance in the cloud from a test
Understand how to stop kubelet from a test (oc debug?, SSH creds in cluster?)
Write the tests as described above
Ensure the tests actually pass

Stakeholders

Cluster Infra
TRT
etcd

Definition of Done

Tests are added to the existing periodic suite

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/239

Task HOSTEDCP-1305: Refactor calls to get HostedCluster namespace

View the Description View the linked PRs

General code cleanup and improvement

https://github.com/openshift/hypershift/pull/2619

Bug OCPBUGS-19116: Update 4.15 ose-gcp-cluster-api-controllers image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-gcp/pull/200

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-gcp/pull/200

Bug OCPBUGS-24143: Update 4.15 ose-machine-api-provider-gcp-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-gcp/pull/72

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-gcp/pull/72

Bug OCPBUGS-16794: The file permission of the controller manager pod specification file should be set to 600 to conform with CIS benchmarks

View the Description View the linked PRs

Description of problem:

Observation from CISv1.4 pdf:
1.1.3 Ensure that the controller manager pod specification file



When I checked I found description of the controller manager pod specification file in CIS v1.4 PDF is as follows:
"Ensure that the controller manager pod specification file has permissions of 600 or more
restrictive.
 
OpenShift 4 deploys two API servers: the OpenShift API server and the Kube API server. The OpenShift API server delegates requests for Kubernetes objects to the Kube API server.
The OpenShift API server is managed as a deployment. The pod specification yaml for openshift-apiserver is stored in etcd.
The Kube API Server is managed as a static pod. The pod specification file for the kube-apiserver is created on the control plane nodes at /etc/kubernetes/manifests/kube-apiserver-pod.yaml. The kube-apiserver is mounted via hostpath to the kube-apiserver pods via /etc/kubernetes/static-pod-resources/kube-apiserver-pod.yaml with permissions 600."
 
To conform with CIS benchmarks, the controller manager pod specification file should be updated to 600.

$ for i in $( oc get pods -n openshift-kube-controller-manager -o name -l app=kube-controller-manager)
do                          
oc exec -n openshift-kube-controller-manager $i -- stat -c %a /etc/kubernetes/static-pod-resources/kube-controller-manager-pod.yaml  
done                                                                    
644
644
644

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-20-215234

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

The controller manager pod specification file for the kube-apiserver is 644.

Expected results:

The controller manager pod specification file for the kube-apiserver is 644.

Additional info:

https://github.com/openshift/library-go/commit/19a42d2bae8ba68761cfad72bf764e10d275ad6e

Bug OCPBUGS-24030: MachineConfigNode condition Cordoned is used to show status of both cordon and uncordon

View the Description View the linked PRs

When we apply a machine config with additional ssh key info, this action only needs to uncordon the node, when uncordon is happening, condition Cordoned = True. it will make the user confuse. maybe we can refine this design to show status of cordon/uncordon separately

lastTransitionTime: '2023-11-28T16:53:58Z'   message: 'Action during previous iteration: (Un)Cordoned node. The node is reporting     Unschedulable = false'   reason: UpdateCompleteCordoned   status: 'False'   type: Cordoned

https://github.com/openshift/machine-config-operator/pull/4065

Bug OCPBUGS-19236: Update 4.15 cluster-network-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-network-operator/pull/2006

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-network-operator/pull/2006

Bug OCPBUGS-18114: [CNO] nodeSelector[beta.kubernetes.io/os]: deprecated since v1.14

View the Description View the linked PRs

Description of problem:


Warning: spec.template.spec.nodeSelector[beta.kubernetes.io/os]: deprecated since v1.14; use "kubernetes.io/os" instead

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-17-145803

How reproducible:
Always

Steps to Reproduce:

1. oc rollout restart ds/ovnkube-node
2.
3.

Actual results:

Warning: spec.template.spec.nodeSelector[beta.kubernetes.io/os]: deprecated since v1.14; use "kubernetes.io/os" instead

Expected results:

No warning

https://github.com/openshift/cluster-network-operator/pull/1845

Bug OCPBUGS-21798: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/service-ca-operator/pull/223

Bug OCPBUGS-23554: After PatternFly 5 update? YAML edit tab collapse the current section after user changes the content

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13414

Bug OCPBUGS-18847: Update 4.15 ose-multus-whereabouts-ipam-cni image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/whereabouts-cni/pull/192

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/whereabouts-cni/pull/192

Bug OCPBUGS-13044: machine-config-operator does not honor ICSP when fetching machine-os-content

View the Description View the linked PRs

Description of problem:

During cluster installations/upgrades with an imageContentSourcePolicy in place but with access to quay.io, the ICSP is not honored to pull the machine-os-content image from a private registry.

Version-Release number of selected component (if applicable):

$ oc logs -n openshift-machine-config-operator ds/machine-config-daemon -c machine-config-daemon|head -1
Found 6 pods, using pod/machine-config-daemon-znknf
I0503 10:53:00.925942    2377 start.go:112] Version: v4.12.0-202304070941.p0.g87fedee.assembly.stream-dirty (87fedee690ae487f8ae044ac416000172c9576a5)

How reproducible:

100% in clusters with ICSP configured BUT with access to quay.io

Steps to Reproduce:

1. Create mirror repo:
$ cat <<EOF > /tmp/isc.yaml                                                    
kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
archiveSize: 4
storageConfig:
  registry:
    imageURL: quay.example.com/mirror/oc-mirror-metadata
    skipTLS: true
mirror:
  platform:
    channels:
    - name: stable-4.12
      type: ocp
      minVersion: 4.12.13
    graph: true
EOF
$ oc mirror --dest-skip-tls  --config=/tmp/isc.yaml docker://quay.example.com/mirror/oc-mirror-metadata
<...>
info: Mirroring completed in 2m27.91s (138.6MB/s)
Writing image mapping to oc-mirror-workspace/results-1683104229/mapping.txt
Writing UpdateService manifests to oc-mirror-workspace/results-1683104229
Writing ICSP manifests to oc-mirror-workspace/results-1683104229

2. Confirm machine-os-content digest:
$ oc adm release info 4.12.13 -o jsonpath='{.references.spec.tags[?(@.name=="machine-os-content")].from}'|jq
{
  "kind": "DockerImage",
  "name": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a1660c8086ff85e569e10b3bc9db344e1e1f7530581d742ad98b670a81477b1b"
}
$ oc adm release info 4.12.14 -o jsonpath='{.references.spec.tags[?(@.name=="machine-os-content")].from}'|jq
{
  "kind": "DockerImage",
  "name": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ed68d04d720a83366626a11297a4f3c5761c0b44d02ef66fe4cbcc70a6854563"
}

3. Create 4.12.13 cluster with ICSP at install time:
$ grep imageContentSources -A6 ./install-config.yaml
imageContentSources:
  - mirrors:
    - quay.example.com/mirror/oc-mirror-metadata/openshift/release
    source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
  - mirrors:
    - quay.example.com/mirror/oc-mirror-metadata/openshift/release-images
    source: quay.io/openshift-release-dev/ocp-release

Actual results:

1. After the installation is completed, no pulls for a166 (4.12.13-x86_64-machine-os-content) are logged in the Quay usage logs whereas e.g. digest 22d2 (4.12.13-x86_64-machine-os-images) are reported to be pulled from the mirror. 

2. After upgrading to 4.12.14 no pulls for ed68 (4.12.14-x86_64-machine-os-content) are logged in the mirror-registry while the image was pulled as part of `oc image extract` in the machine-config-daemon:

[core@master-1 ~]$ sudo less /var/log/pods/openshift-machine-config-operator_machine-config-daemon-7fnjz_e2a3de54-1355-44f9-a516-2f89d6c6ab8f/machine-config-daemon/0.log                        2023-05-03T10:51:43.308996195+00:00 stderr F I0503 10:51:43.308932   11290 run.go:19] Running: nice -- ionice -c 3 oc image extract -v 10 --path /:/run/mco-extensions/os-extensions-content-4035545447 --registry- config /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ad48fe01f3e82584197797ce2151eecdfdcce67ae1096f06412e5ace416f66ce 2023-05-03T10:51:43.418211869+00:00 stderr F I0503 10:51:43.418008  184455 client_mirrored.go:174] Attempting to connect to quay.io/openshift-release-dev/ocp-v4.0-art-dev 2023-05-03T10:51:43.418211869+00:00 stderr F I0503 10:51:43.418174  184455 round_trippers.go:466] curl -v -XGET  -H "User-Agent: oc/4.12.0 (linux/amd64) kubernetes/31aa3e8" 'https://quay.io/v2/' 2023-05-03T10:51:43.419618513+00:00 stderr F I0503 10:51:43.419517  184455 round_trippers.go:495] HTTP Trace: DNS Lookup for quay.io resolved to [{34.206.15.82 } {54.209.210.231 } {52.5.187.29 } {52.3.168.193 }  {52.21.36.23 } {50.17.122.58 } {44.194.68.221 } {34.194.241.136 } {2600:1f18:483:cf01:ebba:a861:1150:e245 } {2600:1f18:483:cf02:40f9:477f:ea6b:8a2b } {2600:1f18:483:cf02:8601:2257:9919:cd9e } {2600:1f18:483:cf01 :8212:fcdc:2a2a:50a7 } {2600:1f18:483:cf00:915d:9d2f:fc1f:40a7 } {2600:1f18:483:cf02:7a8b:1901:f1cf:3ab3 } {2600:1f18:483:cf00:27e2:dfeb:a6c7:c4db } {2600:1f18:483:cf01:ca3f:d96e:196c:7867 }] 2023-05-03T10:51:43.429298245+00:00 stderr F I0503 10:51:43.429151  184455 round_trippers.go:510] HTTP Trace: Dial to tcp:34.206.15.82:443 succeed

Expected results:

All images are pulled from the location as configured in the ICSP.

Additional info:

https://github.com/openshift/machine-config-operator/pull/3921

Bug OCPBUGS-23248: When a receiver is created for alert notification through web console uses match instead of matchers

View the Description View the linked PRs

Description of problem:

Alert notification receiver created through web console creates receiver with field match which is deprecated instead of matchers and when match is changed to matchers causes Alertmanager pods to crashloopbackoff state throwing the error:
~~~
ts=2023-11-14T08:42:39.694Z caller=coordinator.go:118 level=error component=configuration msg="Loading configuration file failed" file=/etc/alertmanager/config_out/alertmanager.env.yaml err="yaml: unmarshal errors:\n  line 51: cannot unmarshal !!map into []string"
~~~

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Create alert notification receiver through web console.
Administration-->configuration-->Alertmanager-->create receiver-->add receiver

2. Check the yaml created which would contain route section with match and not matchers.

3. correct the match to matchers and not change the matchers defined like severity or alertname correctly  .

4. Restart the Alertmanager pods which leads to crashloopbackoff state.

Actual results:

Alert notification receiver uses match field

Expected results:

Alert notification receiver should use matchers filed

Additional info:

https://github.com/openshift/console/pull/13358

Bug OCPBUGS-23388: Pipeline Name gets changed to "new-pipeline" on the Edit Pipeline YAML/Builder

View the Description View the linked PRs

Description of problem:

Pipeline Name gets changed to "new-pipeline" on the Edit Pipeline YAML/Builder

Version-Release number of selected component (if applicable):

Openshift 4.15
Pipelines Operator: 1.12.1

How reproducible:

Always when you are creating the tasks using YAML and then creating Pipeline with the tasks. 
(NOT OBSERVED WHEN USING THE PIPELINE BUILDER)

Steps to Reproduce:

1. Create Task 1: https://tekton.dev/docs/getting-started/tasks/#create-and-run-a-basic-task
2. Create Task 2: https://tekton.dev/docs/getting-started/pipelines/#create-and-run-a-second-task
3. Create Pipeline: https://tekton.dev/docs/getting-started/pipelines/#create-and-run-a-pipeline
4. Click "Edit Pipeline" from the Actions Menu

Actual results:

Pipeline Name gets changed to "new-pipeline" on the Edit Pipeline YAML/Builder, and cannot update the Pipeline.

Expected results:

The pipeline name shouldnot change.

Additional info:

Video : https://drive.google.com/file/d/19-dI8lSdH6tAZm3T8CQHw78P2AzdSIRv/view?usp=sharing

https://github.com/openshift/console/pull/13344

Bug OCPBUGS-24126: Update 4.15 prometheus-operator-admission-webhook-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prometheus-operator/pull/260

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/prometheus-operator/pull/260

Bug OCPBUGS-24115: Update 4.15 ose-cluster-update-keys-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-update-keys/pull/52

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-update-keys/pull/52

Bug OCPBUGS-24177: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4049

Bug MGMT-16303: Boot after coreos-install fails with 4.15-ec.2 on KVM/s390x

View the Description View the linked PRs

Description of the problem:

The reboot that happens after writing the RHCOS image to the disk fails with 4.15-ec.2 on KVM s390.

How reproducible:

I am not able to reproduced in the qemu s390x emulator. But Amadeus Podvratnik had the issue in real hardware.

Steps to reproduce:

1. Use assisted installer with version 4.15-ec.2 to install to a logical partition.

Actual results:

The installer writes the RHCOS image to the disk, but then fails to boot from it. Instead it boots to the emergency shell and writes this errors to the console:

Nov 27 12:49:49 localhost ostree-prepare-root[1130]: ostree-prepare-root: Couldnn
't find specified OSTree root '/sysroot//ostree/boot.1/rhcos/452f29cc74e701f4f3ff
69e66657fe28788d6c490aa0032c138909b7b2ce429c7/0': No such file or directory
Nov 27 12:49:49 localhost systemd[1]: ostree-prepare-root.service: Main process  
exited, code=exited, status=1/FAILURE
Nov 27 12:49:49 localhost systemd[1]: ostree-prepare-root.service: Failed with rr
esult 'exit-code'.
Nov 27 12:49:49 localhost systemd[1]: Failed to start OSTree Prepare OS/.

Expected results:

Should boot and continue the installation.

https://github.com/openshift/assisted-service/pull/5765

Bug OCPBUGS-21733: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-openshift-apiserver-operator/pull/552

Bug OCPBUGS-25412: APIServer URL Env is required on all nodes

View the Description View the linked PRs

Description of problem:

    The apiserver-url.env file is a dependency of all CCM components. These mostly run on the masters, however, on Azure, they also run on workers.

A recent change in kube (https://github.com/kubernetes/kubernetes/pull/121028) means that a previous bug has been fixed that now means that workers no longer bootstrap, since Kubelet no longer sets an IP address.

To resolve this issue, we need the CNM to be able to talk to KAS outside of the CNI, this works already on masters, but the url env file is missing on workers so they get stuck.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/4076

Bug OCPBUGS-19127: Update 4.15 ose-containernetworking-plugins image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/containernetworking-plugins/pull/122

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/containernetworking-plugins/pull/122

Bug OCPBUGS-15910: The KUBELET_NODE_IPS does not reflect in the kubelet service after the dual-stack conversion

View the Description View the linked PRs

$ oc get mc 01-master-kubelet -o json | jq -r '.spec.config.systemd.units | .[] | select(.name=="kubelet.service") | .contents'
[Unit]
Description=Kubernetes Kubelet
Wants=rpc-statd.service network-online.target
Requires=crio.service kubelet-auto-node-size.service
After=network-online.target crio.service kubelet-auto-node-size.service
After=ostree-finalize-staged.service

[Service]
Type=notify
ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests
ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state
ExecStartPre=/bin/rm -f /var/lib/kubelet/memory_manager_state
EnvironmentFile=/etc/os-release
EnvironmentFile=-/etc/kubernetes/kubelet-workaround
EnvironmentFile=-/etc/kubernetes/kubelet-env
EnvironmentFile=/etc/node-sizing.env

ExecStart=/usr/local/bin/kubenswrapper \
    /usr/bin/kubelet \
      --config=/etc/kubernetes/kubelet.conf \
      --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \
      --kubeconfig=/var/lib/kubelet/kubeconfig \
      --container-runtime=remote \
      --container-runtime-endpoint=/var/run/crio/crio.sock \
      --runtime-cgroups=/system.slice/crio.service \
      --node-labels=node-role.kubernetes.io/control-plane,node-role.kubernetes.io/master,node.openshift.io/os_id=${ID} \
      --node-ip=${KUBELET_NODE_IP} \
      --minimum-container-ttl-duration=6m0s \
      --cloud-provider= \
      --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \
       \
      --hostname-override=${KUBELET_NODE_NAME} \
      --provider-id=${KUBELET_PROVIDERID} \
      --register-with-taints=node-role.kubernetes.io/master=:NoSchedule \
      --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4c0a1b82501a416df4b926801bc3aa378d2762d0570a0791c6675db1a3365c62 \
      --system-reserved=cpu=${SYSTEM_RESERVED_CPU},memory=${SYSTEM_RESERVED_MEMORY},ephemeral-storage=${SYSTEM_RESERVED_ES} \
      --v=${KUBELET_LOG_LEVEL}

Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

https://github.com/openshift/machine-config-operator/blob/29b3729923273ae7f42cd20e096fa1a390d4b108/templates/master/01-master-kubelet/_base/units/kubelet.service.yaml#L33

https://github.com/openshift/machine-config-operator/pull/3909

Bug SO-119: Resync OKD MariaDB and ruby imagestreams from library

View the Description View the linked PRs

Description of problem:

OKD samples have synced with invalid MariaDB ref

https://github.com/openshift/cluster-samples-operator/pull/525

Bug OCPBUGS-22710: Can we view status of an adminbased external route policy, if so then how/where?

View the Description View the linked PRs

Description of problem:

On the prerelease doc Configure a secondary external gateway, on stop 3. we state the output of said command should confirm the admin policy has been created:

#oc describe apbexternalroute <name> | tail -n 6

First of all this is a typo there is no "apbexternalroute", the correct term is "adminpolicybasedexternalroutes", even if we use the correct term, the resulting output is almost not relevant as per the status of said policy, it just reports on the policy it's self and well some minor details like time and so on.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-04-143709

How reproducible:

Every time

Steps to Reproduce:

1. Deploy a cluster
2. Boot up a pod under a namespace
3. $ cat 4.create.abp_static_bar1.yaml  later apply said policy
apiVersion: k8s.ovn.org/v1
kind: AdminPolicyBasedExternalRoute
metadata:
  name: first-policy
spec:
## gateway example
  from:
    namespaceSelector:
      matchLabels:
          kubernetes.io/metadata.name: bar
  nextHops:       
    static:
      - ip: "173.20.0.8"
      - ip: "173.20.0.9"
4. confirm policy in place: $ oc getadminpolicybasedexternalroutes.k8s.ovn.org 
NAME           LAST UPDATE   STATUS
first-policy   

5. But wow do we test the policies status? 
The doc's guide doesn't help much:  $ oc describeadminpolicybasedexternalroutes.k8s.ovn.org <name> | tail -n 6 

$ oc describe adminpolicybasedexternalroutes.k8s.ovn.org first-policy 
Name:         first-policy
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  k8s.ovn.org/v1
Kind:         AdminPolicyBasedExternalRoute
Metadata:
  Creation Timestamp:  2023-10-30T20:09:20Z
  Generation:          1
  Resource Version:    10904672
  UID:                 3c4a60da-a618-45b1-94a8-2085dcdc5631
Spec:
  From:
    Namespace Selector:
      Match Labels:
        kubernetes.io/metadata.name:  bar
  Next Hops:
    Static:
      Bfd Enabled:  false
      Ip:           173.20.0.8
      Bfd Enabled:  false
      Ip:           173.20.0.9
Events:             <none>
 

Noting regarding policy status shows up, if this is even supported at all, other than fixing the doc, if there is a way to view the status it should be documented. One more thing if there is indeed a policy status shouldn't it also populate the status column here:

$ oc get adminpolicybasedexternalroutes.k8s.ovn.org 
NAME           LAST UPDATE   STATUS
first-policy                   ^ 

Asking as on another bug https://issues.redhat.com/browse/OCPBUGS-22706, I recreated a situation where the status should have reported an error yet it never did nor does it update the above table, come to think of it the last update column too has never exposed any data either, in which case why do we even have these two columns to begin with?

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-24087: Update 4.15 cluster-etcd-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-etcd-operator/pull/1169

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-etcd-operator/pull/1169

Bug OCPBUGS-25140: [release-4.15] Node Overview Pane not displaying

View the Description View the linked PRs

This is a clone of issue OCPBUGS-24408. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13435

Bug MGMT-16037: [STG][Scale] Failed to update cluster with 103 nodes

View the Description View the linked PRs

Cluster with 103 nodes failed to update in UI on Networking page
with Dialog error: "The service is down, undergoing maintenance, or experiencing another issue."
And error in UI:
"[10] Message Size Too Large: the server has a configurable maximum message size to avoid
unbounded memory allocation and the client attempted to produce a message larger than this maximum"

And in browser Debugger
PATCH https://api.stage.openshift.com/api/assisted-install/v2/clusters/674c7056-4db9-4ea6-9f1d-f976fc77897e 500 (Internal Server Error)

See attached screenshot

Steps to reproduce:
1. Create cluster, generate minimal ISO image, download to servers
2. Boot 103 nodes with ISO image
3. Wait all nodes finished discovering
4. Click Next , Next
5. Set API and Ingress VIP in Networking page

Actual results:
Raise error dialog: Unable to update cluster
The service is down, undergoing maintenance, or experiencing another issue.
and ask to Refresh. Which return back to Cluster details page

Expected results:
Should update cluster and allow continue to install cluster

https://github.com/openshift/assisted-service/pull/5628

Bug OCPBUGS-19171: Update 4.15 configmap-reload image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/configmap-reload/pull/56

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/configmap-reload/pull/56

Bug OCPBUGS-23787: After PatternFly5 update: Quickstarts catalog item count is not vertical aligned

View the Description View the linked PRs

Issue 58 from https://docs.google.com/spreadsheets/d/1TR3ENY-GE_LQL9F-xH6NahtRHdu6IrYGhLqDDA8y0EI/edit#gid=1035185624

Quickstarts catalog item count isn't vertical aligned anymore

Screenshot: https://drive.google.com/file/d/1hxh5VI2S7jLKRdNlDQsdlAXL_G7TxtME/view?usp=sharing

https://github.com/openshift/console/pull/13367

Bug MGMT-15926: [STG][OLM] Assisted Installer failed to install MCE operator on Multi Node cluster

View the Description View the linked PRs

Some operators failed to install
Multicluster engine (MCE) failed to install. Due to this, the cluster will be degraded, but you can try to install the operator from the Operator Hub. Please check the installation log for more information.

OpenShift version 4.14.0-rc.4

While installed successfully on OpenShift version 4.13.13

Steps to reproduce:
1. Create cluster on AI SaaS version OCP 4.14.0-rc.4
2. Select MCE operator
3. Continue settings and start installaiton

Actual results:
Cluster installed but
Operators
Multicluster engine failed

Expected results:
Operators
Multicluster engine installed

https://github.com/openshift/assisted-installer/pull/748

Bug OCPBUGS-19097: Update 4.15 openshift-proxy-pull-test image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-config-operator/pull/3918

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-config-operator/pull/3918

Bug OCPBUGS-20350: Vsphere IPI installation is getting failed with panic: runtime error: invalid memory address or nil pointer dereference

View the Description View the linked PRs

Description of problem:

Vsphere IPI installation is getting failed with panic: runtime error: invalid memory address or nil pointer dereference

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Download 4.13 installation binary
2. Run openshift-install create cluster command.

Actual results:

Error:

DEBUG   Generating Platform Provisioning Check...
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x50 pc=0x3401c4e]goroutine 1 [running]:
github.com/openshift/installer/pkg/asset/installconfig/vsphere.validateESXiVersion(0xc001524060?, {0xc00018aff0, 0x43}, 0x1?, 0x1?)
        /go/src/github.com/openshift/installer/pkg/asset/installconfig/vsphere/validation.go:279 +0xb6e
github.com/openshift/installer/pkg/asset/installconfig/vsphere.validateFailureDomain(0xc001524060, 0xc00022c840, 0x0)
        /go/src/github.com/openshift/installer/pkg/asset/installconfig/vsphere/validation.go:167 +0x6b6
github.com/openshift/installer/pkg/asset/installconfig/vsphere.ValidateForProvisioning(0xc0003d4780)
        /go/src/github.com/openshift/installer/pkg/asset/installconfig/vsphere/validation.go:132 +0x675
github.com/openshift/installer/pkg/asset/installconfig.(*PlatformProvisionCheck).Generate(0xc0000f2000?, 0x5?)
        /go/src/github.com/openshift/installer/pkg/asset/installconfig/platformprovisioncheck.go:112 +0x45f
github.com/openshift/installer/pkg/asset/store.(*storeImpl).fetch(0xc000925e90, {0x1dc012d0, 0x2279afa8}, {0x7c34091, 0x2})
        /go/src/github.com/openshift/installer/pkg/asset/store/store.go:226 +0x5fa
github.com/openshift/installer/pkg/asset/store.(*storeImpl).fetch(0xc000925e90, {0x1dc01090, 0x22749ce0}, {0x0, 0x0})
        /go/src/github.com/openshift/installer/pkg/asset/store/store.go:220 +0x75b
github.com/openshift/installer/pkg/asset/store.(*storeImpl).Fetch(0x7ffe670305f1?, {0x1dc01090, 0x22749ce0}, {0x227267a0, 0x8, 0x8})
        /go/src/github.com/openshift/installer/pkg/asset/store/store.go:76 +0x48
main.runTargetCmd.func1({0x7ffe670305f1, 0x6})
        /go/src/github.com/openshift/installer/cmd/openshift-install/create.go:260 +0x125
main.runTargetCmd.func2(0x2272da00?, {0xc000925410?, 0x3?, 0x3?})
        /go/src/github.com/openshift/installer/cmd/openshift-install/create.go:290 +0xe7
github.com/spf13/cobra.(*Command).execute(0x2272da00, {0xc000925380, 0x3, 0x3})
        /go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:920 +0x847
github.com/spf13/cobra.(*Command).ExecuteC(0xc000210900)
        /go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:1040 +0x3bd
github.com/spf13/cobra.(*Command).Execute(...)
        /go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:968
main.installerMain()
        /go/src/github.com/openshift/installer/cmd/openshift-install/main.go:61 +0x2b0
main.main()
        /go/src/github.com/openshift/installer/cmd/openshift-install/main.go:38 +0xff

Expected results:

Installation to be completed successfully.

Additional info:

https://github.com/openshift/installer/pull/7575

Bug OCPBUGS-23073: .spec.numberOfUsersToReport is not correctly applied in some circumstances

View the Description View the linked PRs

E1106 21:44:31.805740 18 apiaccess_count_controller.go:168] APIRequestCount.apiserver.openshift.io "nodes.v1" is invalid: [status.currentHour.byNode[0].byUser: Too many: 708: must have at most 500 items, status.last24h[21].byNode[0].byUser: Too many: 708: must have at most 500 items]

seen in a large-scale test; 750 nodes, 180,000 pods, 90,000 services, pods/services being created at 20 objects/second.

https://redhat-internal.slack.com/archives/CB48XQ4KZ/p1699307146216599

Luis Sanchez said "Just confirmed that under certain circumstances, the .spec.numberOfUsersToReport field is not being applied correctly. Open a bug please."

https://github.com/openshift/kubernetes/pull/1794

Bug OCPBUGS-25245: MCO the content mismatch bug revised when upgrading from 4.13.23 to 4.14.3

View the Description View the linked PRs

Description of problem:

    When upgrading cluster from 4.13.23 to 4.14.3, machine-config CO gets stuck due to a content mismatch error on all nodes.

Node node-xxx-xxx is reporting: "unexpected on-disk state
      validating against rendered-master-734521b50f69a1602a3a657419ed4971: content
      mismatch for file \"/etc/pki/ca-trust/source/anchors/openshift-config-user-ca-bundle.crt\""

Version-Release number of selected component (if applicable):

How reproducible:

    always

Steps to Reproduce:

    1. perform a upgrade from 4.13.x to 4.14.x
    2. 
    3.

Actual results:

    machine-config stalls during upgrade

Expected results:

    the "content mismatch" shouldn't happen anymore according to the MCO engineering team

Additional info:

https://github.com/openshift/machine-config-operator/pull/4073

Bug OCPBUGS-5755: GCP XPN private cluster install attempts to add masters to k8s-ig-xxxx instance groups

View the Description View the linked PRs

Description of problem:

Attempting to perform a GCP XPN internal cluster installation, the install fails when the master nodes are added to a second [internal] instance group (k8s-ig-xxxx).

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1. The following install config was used:

additionalTrustBundlePolicy: Proxyonly
apiVersion: v1
baseDomain: installer.gcp.devcluster.openshift.com
credentialsMode: Passthrough
featureSet: TechPreviewNoUpgrade
compute:
- architecture: amd64
  hyperthreading: Enabled
  name: worker
  platform: {}
  replicas: 3
controlPlane:
  architecture: amd64
  hyperthreading: Enabled
  name: master
  platform: {}
  replicas: 3
metadata:
  creationTimestamp: null
  name: bbarbach-xpn
networking:
  clusterNetwork:
  - cidr: 10.124.0.0/14
    hostPrefix: 23
  machineNetwork:
  - cidr: 10.128.0.0/16
  networkType: OVNKubernetes
  serviceNetwork:
  - 172.30.0.0/16
platform:
  gcp:
    projectID: openshift-installer-shared-vpc
    region: us-central1
    network: bbarbach-internal-vpc
    computeSubnet: bbarbach-internal-vpc
    controlPlaneSubnet: bbarbach-internal-vpc
    networkProjectID: openshift-dev-installer
publish: Internal

2. This is a shared VPC install so the service and host projects need to be used in the install-config above.

3. Set the release image to 4.13-nightly

4. openshift-install create cluster --log-level=DEBUG

Actual results:

ERROR                                              
ERROR Error: Error waiting for Updating RegionBackendService: Validation failed for instance 'projects/openshift-installer-shared-vpc/zones/us-central1-a/instances/bbarbach-xpn-4t8zl-master-0': instance may belong to at most one load-balanced instance group. 
ERROR                                              
ERROR                                              
ERROR   with google_compute_region_backend_service.api_internal, 
ERROR   on main.tf line 13, in resource "google_compute_region_backend_service" "api_internal": 
ERROR   13: resource "google_compute_region_backend_service" "api_internal" { 
ERROR                                              
FATAL failed disabling bootstrap load balancing: failed to apply Terraform: exit status 1 
FATAL                                              
FATAL Error: Error waiting for Updating RegionBackendService: Validation failed for instance 'projects/openshift-installer-shared-vpc/zones/us-central1-a/instances/bbarbach-xpn-4t8zl-master-0': instance may belong to at most one load-balanced instance group. 
FATAL                                              
FATAL                                              
FATAL   with google_compute_region_backend_service.api_internal, 
FATAL   on main.tf line 13, in resource "google_compute_region_backend_service" "api_internal": 
FATAL   13: resource "google_compute_region_backend_service" "api_internal" { 
FATAL                                              
FATAL

Expected results:

Successful install

Additional info:

The normal GCP internal cluster installation succeeds. Checking the instance groups, the internal cluster creates the k8s-ig-xxxx instance groups where the workers are added to each respective group. The masters are NOT added to the instance groups. The failure during the xpn install occurs because these masters are added to the instance groups.

https://github.com/openshift/cloud-provider-gcp/pull/35

Bug OCPBUGS-18855: Update 4.15 openshift-enterprise-builder image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/builder/pull/357

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/builder/pull/357

Bug OCPBUGS-20525: Masters are not attached with the provided custom security groups which defined in platform.aws.defaultMachinePlatform

View the Description View the linked PRs

Description of problem:

Set custom security group IDs in the installconfig.platform.aws.defaultMachinePlatform.additionalSecurityGroupIDs field of install-config.yaml

such as: 

   apiVersion: v1
   controlPlane:
     architecture: amd64
     hyperthreading: Enabled
     name: master
     platform: {}
     replicas: 3
   compute:
   - architecture: amd64
     hyperthreading: Enabled
     name: worker
     platform: {}
     replicas: 3
   metadata:
     name: gpei-test1013
   platform:
     aws:
       region: us-east-2
       subnets:
       - subnet-0bc86b64e7736479c
       - subnet-0addd33c410b52251
       - subnet-093392f94a4099566
       - subnet-0b915a53042b6dc61
       defaultMachinePlatform:
         additionalSecurityGroupIDs:
         - sg-0fbc4c9733e6c18e7
         - sg-0b46b502b575d30ba
         - sg-02a59f8662d10c6d3


After installation, check the Security Groups attached to master and worker, master doesn't have the specified custom security groups attached while workers have. 

For one of the masters:
[root@preserve-gpei-worker k_files]# aws ec2 describe-instances --instance-ids i-08c0b0b6e4308be3b  --query 'Reservations[*].Instances[*].SecurityGroups[*]' --output json
[
    [
        [
            {
                "GroupName": "terraform-20231013000602175000000002",
                "GroupId": "sg-04b104d07075afe96"
            }
        ]
    ]
]

For one of the workers:
[root@preserve-gpei-worker k_files]# aws ec2 describe-instances --instance-ids i-00643f07748ec75da --query 'Reservations[*].Instances[*].SecurityGroups[*]' --output json
[
    [
        [
            {
                "GroupName": "test-sg2",
                "GroupId": "sg-0b46b502b575d30ba"
            },
            {
                "GroupName": "terraform-20231013000602174300000001",
                "GroupId": "sg-0d7cd50d4cb42e513"
            },
            {
                "GroupName": "test-sg3",
                "GroupId": "sg-02a59f8662d10c6d3"
            },
            {
                "GroupName": "test-sg1",
                "GroupId": "sg-0fbc4c9733e6c18e7"
            }
        ]
    ]
]


Also checked the master's controlplanemachineset, it does have the custom security groups configured, but they're not attached to the master instance in the end.

[root@preserve-gpei-worker k_files]# oc get controlplanemachineset -n openshift-machine-api cluster -o yaml |yq .spec.template.machines_v1beta1_machine_openshift_io.spec.providerSpec.value.securityGroups
- filters:
    - name: tag:Name
      values:
        - gpei-test1013-8lwtb-master-sg
- id: sg-02a59f8662d10c6d3
- id: sg-0b46b502b575d30ba
- id: sg-0fbc4c9733e6c18e7

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-12-104602

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

It works well when setting the security groups in installconfig.controlPlane.platform.aws.additionalSecurityGroupIDs

https://github.com/openshift/installer/pull/7589

Bug OCPBUGS-13968: Rebase coredns to upstream version based on k8s APIs v0.27

View the Description View the linked PRs

Description of problem:

The current version of openshift/coredns vendors Kubernetes 1.26 packages. OpenShift 4.14 is based on Kubernetes 1.27.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Check https://github.com/openshift/coredns/blob/release-4.14/go.mod

Actual results:

Kubernetes packages (k8s.io/api, k8s.io/apimachinery, and k8s.io/client-go) are at version v0.26

Expected results:

Kubernetes packages are at version v0.27.0 or later.

Additional info:

Using old Kubernetes API and client packages brings risk of API compatibility issues.

https://github.com/openshift/coredns/pull/94

Bug OCPBUGS-19181: Update 4.15 ose-ibmcloud-cluster-api-controllers image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-ibmcloud/pull/58

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-ibmcloud/pull/58

Bug OCPBUGS-24036: [4.15] CNO fails to apply ovnkube-master daemonset during upgrade

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-22293~~. The following is the description of the original issue:
—
Description of problem:

Upgrading from 4.13.5 to 4.13.17 fails at network operator upgrade

Version-Release number of selected component (if applicable):

How reproducible:

Not sure since we only had one cluster on 4.13.5.

Steps to Reproduce:

1. Have a cluster on version 4.13.5 witn ovn kubernetes
2. Set desired update image to quay.io/openshift-release-dev/ocp-release@sha256:c1f2fa2170c02869484a4e049132128e216a363634d38abf292eef181e93b692
3. Wait until it reaches network operator

Actual results:

Error message: Error while updating operator configuration: could not apply (apps/v1, Kind=DaemonSet) openshift-ovn-kubernetes/ovnkube-master: failed to apply / update (apps/v1, Kind=DaemonSet) openshift-ovn-kubernetes/ovnkube-master: DaemonSet.apps "ovnkube-master" is invalid: [spec.template.spec.containers[1].lifecycle.preStop: Required value: must specify a handler type, spec.template.spec.containers[3].lifecycle.preStop: Required value: must specify a handler type]

Expected results:

Network operator upgrades successfully

Additional info:

Since I'm not able to attach files please gather all required debug data from https://access.redhat.com/support/cases/#/case/03645170

https://github.com/openshift/cluster-network-operator/pull/2167

Bug OCPBUGS-24166: Update 4.15 ose-azure-cloud-controller-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-azure/pull/98

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-azure/pull/98

Bug OCPBUGS-25600: AWS: The installer doesn’t precheck if node architecture and vm type are consistent

View the Description View the linked PRs

Description of problem:

The installer doesn’t do precheck if node architecture and vm type are consistent for aws and gcp, it works on azure

Version-Release number of selected component (if applicable):

    4.15.0-0.nightly-multi-2023-12-06-195439

How reproducible:

   Always

Steps to Reproduce:

    1.Config compute architecture field to arm64 but vm type choose amd64 instance type in install-config     
    2.Create cluster 
    3.Check installation

Actual results:

Azure will precheck if architecture is consistent with instance type when creating manifests, like:
12-07 11:18:24.452 [INFO] Generating manifests files.....12-07 11:18:24.452 level=info msg=Credentials loaded from file "/home/jenkins/ws/workspace/ocp-common/Flexy-install/flexy/workdir/azurecreds20231207-285-jd7gpj"
12-07 11:18:56.474 level=error msg=failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: controlPlane.platform.azure.type: Invalid value: "Standard_D4ps_v5": instance type architecture 'Arm64' does not match install config architecture amd64

But aws and gcp don’t have precheck, it will fail during installation, but many resources have been created. The case more likely to happen in multiarch cluster

Expected results:

The installer can do a precheck for architecture and vm type , especially for heterogeneous supported platforms(aws,gcp,azure)

Additional info:

https://github.com/openshift/installer/pull/7835

Bug OCPBUGS-20161: HostedCluster with ControlPlaneEndpoint: 443 also exposes on 6443

View the Description View the linked PRs

Description of problem:

HostedClusters with a .status.controlPlaneEndpoint.port: 443 unexepectedly also expose the KAS on port 6443. This causes four security group rules to be consumed per LoadBalancer service (443/6443 for router and 443/6443 for private-router) instead of just two (443 for router and 443 for private-router). This directly impacts the number of HostedClusters on a Management Cluster since there is a hard cap of 200 security group rules per security group.

Version-Release number of selected component (if applicable):

4.14.0

How reproducible:

100%

Steps to Reproduce:

1. Create a HostedCluster resulting in its .status.controlPlaneEndpoint.port: 443
2. Observe that the router/private-router LoadBalancer services expose both ports 6443 and 443

Actual results:

The router/private-router LoadBalancer services expose both ports 6443 and 443

Expected results:

The router/private-router LoadBalancer services exposes only port 443

Additional info:

https://github.com/openshift/hypershift/pull/3149

Bug OCPBUGS-21803: Ingress stuck in progressing when maxConnections increased to 2000000

View the Description View the linked PRs

Description of problem:

The test case https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-50926 was created for NE-577 epic. When we increase the 'spec.tuningOptions.maxConnections' to 200000, the default ingress controller stuck in progressing.

Version-Release number of selected component (if applicable):

How reproducible:

https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-50926

Steps to Reproduce:

1.Edit the defualt controller with max value 2000000oc -n openshift-ingress-operator edit ingresscontroller defaulttuningOptions:
    maxConnections: 2000000
2.melvinjoseph@mjoseph-mac openshift-tests-private % oc -n openshift-ingress-operator get ingresscontroller default -o yaml | grep  -A1 tuningOptions
  tuningOptions:
    maxConnections: 2000000
3. melvinjoseph@mjoseph-mac openshift-tests-private % oc get co/ingress 
NAME      VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
ingress   4.15.0-0.nightly-2023-10-16-231617   True        True          False      3h42m   ingresscontroller "default" is progressing: IngressControllerProgressing: One or more status conditions indicate progressing: DeploymentRollingOut=True (DeploymentRollingOut: Waiting for router deployment rollout to finish: 1 old replica(s) are pending termination......

Actual results:

The default ingress controller stuck in progressing

Expected results:

The ingress controller should work as normal

Additional info:

melvinjoseph@mjoseph-mac openshift-tests-private % oc -n openshift-ingress get po
NAME                              READY   STATUS        RESTARTS   AGE
router-default-7cf67f448-gb7mr    0/1     Running       0          38s
router-default-7cf67f448-qmvks    0/1     Running       0          38s
router-default-7dcd556587-kvk8d   0/1     Terminating   0          3h53m
router-default-7dcd556587-vppk4   1/1     Running       0          3h53m
melvinjoseph@mjoseph-mac openshift-tests-private % 

melvinjoseph@mjoseph-mac openshift-tests-private % oc -n openshift-ingress get po
NAME                              READY   STATUS    RESTARTS   AGE
router-default-7cf67f448-gb7mr    0/1     Running   0          111s
router-default-7cf67f448-qmvks    0/1     Running   0          111s
router-default-7dcd556587-vppk4   1/1     Running   0          3h55m

melvinjoseph@mjoseph-mac openshift-tests-private % oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h28m   
baremetal                                  4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h55m   
cloud-controller-manager                   4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h58m   
cloud-credential                           4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h59m   
cluster-autoscaler                         4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h55m   
config-operator                            4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h56m   
console                                    4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h34m   
control-plane-machine-set                  4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h43m   
csi-snapshot-controller                    4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h39m   
dns                                        4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h54m   
etcd                                       4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h47m   
image-registry                             4.15.0-0.nightly-2023-10-16-231617   True        False         False      176m    
ingress                                    4.15.0-0.nightly-2023-10-16-231617   True        True          False      3h39m   ingresscontroller "default" is progressing: IngressControllerProgressing: One or more status conditions indicate progressing: DeploymentRollingOut=True (DeploymentRollingOut: Waiting for router deployment rollout to finish: 1 old replica(s) are pending termination......
insights                                   4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h49m   
kube-apiserver                             4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h45m   
kube-controller-manager                    4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h46m   
kube-scheduler                             4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h46m   
kube-storage-version-migrator              4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h56m   
machine-api                                4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h45m   
machine-approver                           4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h55m   
machine-config                             4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h53m   
marketplace                                4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h55m   
monitoring                                 4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h35m   
network                                    4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h57m   
node-tuning                                4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h39m   
openshift-apiserver                        4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h43m   
openshift-controller-manager               4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h39m   
openshift-samples                          4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h39m   
operator-lifecycle-manager                 4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h54m   
operator-lifecycle-manager-catalog         4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h54m   
operator-lifecycle-manager-packageserver   4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h43m   
service-ca                                 4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h56m   
storage                                    4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h36m   
melvinjoseph@mjoseph-mac openshift-tests-private % oc -n openshift-ingress-operator get po
NAME                               READY   STATUS    RESTARTS        AGE
ingress-operator-c6fd989fd-jsrzv   2/2     Running   4 (3h45m ago)   3h58m
melvinjoseph@mjoseph-mac openshift-tests-private % 


melvinjoseph@mjoseph-mac openshift-tests-private % oc -n openshift-ingress-operator logs ingress-operator-c6fd989fd-jsrzv -c ingress-operator --tail=20
2023-10-17T11:34:54.327Z    INFO    operator.ingress_controller    handler/enqueue_mapped.go:81    queueing ingress    {"name": "default", "related": ""}
2023-10-17T11:34:54.348Z    INFO    operator.ingress_controller    handler/enqueue_mapped.go:81    queueing ingress    {"name": "default", "related": ""}
2023-10-17T11:34:54.348Z    INFO    operator.ingress_controller    handler/enqueue_mapped.go:81    queueing ingress    {"name": "default", "related": ""}
2023-10-17T11:34:54.394Z    INFO    operator.ingressclass_controller    controller/controller.go:118    reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:34:54.394Z    INFO    operator.route_metrics_controller    controller/controller.go:118    reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:34:54.394Z    INFO    operator.status_controller    controller/controller.go:118    Reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:34:54.397Z    INFO    operator.ingress_controller    controller/controller.go:118    reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:34:54.429Z    INFO    operator.status_controller    controller/controller.go:118    Reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:34:54.446Z    INFO    operator.certificate_controller    controller/controller.go:118    Reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:34:54.553Z    INFO    operator.ingressclass_controller    controller/controller.go:118    reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:34:54.553Z    INFO    operator.route_metrics_controller    controller/controller.go:118    reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:34:54.553Z    INFO    operator.status_controller    controller/controller.go:118    Reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:34:54.557Z    ERROR    operator.ingress_controller    controller/controller.go:118    got retryable error; requeueing    {"after": "59m59.9999758s", "error": "IngressController may become degraded soon: DeploymentReplicasAllAvailable=False"}
2023-10-17T11:34:54.558Z    INFO    operator.ingress_controller    controller/controller.go:118    reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:34:54.583Z    INFO    operator.status_controller    controller/controller.go:118    Reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:34:54.657Z    ERROR    operator.ingress_controller    controller/controller.go:118    got retryable error; requeueing    {"after": "59m59.345629987s", "error": "IngressController may become degraded soon: DeploymentReplicasAllAvailable=False"}
2023-10-17T11:34:54.794Z    INFO    operator.certificate_controller    controller/controller.go:118    Reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:36:11.151Z    INFO    operator.ingress_controller    handler/enqueue_mapped.go:81    queueing ingress    {"name": "default", "related": ""}
2023-10-17T11:36:11.151Z    INFO    operator.ingress_controller    controller/controller.go:118    reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:36:11.248Z    ERROR    operator.ingress_controller    controller/controller.go:118    got retryable error; requeueing    {"after": "58m42.755479533s", "error": "IngressController may become degraded soon: DeploymentReplicasAllAvailable=False"}
melvinjoseph@mjoseph-mac openshift-tests-private % 

 
melvinjoseph@mjoseph-mac openshift-tests-private % oc get po -n openshift-ingress
NAME                              READY   STATUS    RESTARTS      AGE
router-default-7cf67f448-gb7mr    0/1     Running   1 (71s ago)   3m57s
router-default-7cf67f448-qmvks    0/1     Running   1 (70s ago)   3m57s
router-default-7dcd556587-vppk4   1/1     Running   0             3h57m

melvinjoseph@mjoseph-mac openshift-tests-private %   oc -n openshift-ingress logs router-default-7cf67f448-gb7mr --tail=20 
I1017 11:39:22.623928       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:23.623924       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:24.623373       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:25.627359       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:26.623337       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:27.623603       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:28.623866       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:29.623183       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:30.623475       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:31.623949       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
melvinjoseph@mjoseph-mac openshift-tests-private % 
melvinjoseph@mjoseph-mac openshift-tests-private % 
melvinjoseph@mjoseph-mac openshift-tests-private % 
melvinjoseph@mjoseph-mac openshift-tests-private %   oc -n openshift-ingress logs router-default-7cf67f448-qmvks --tail=20
I1017 11:39:34.553475       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:35.551412       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:36.551421       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
E1017 11:39:37.052068       1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: no such file or directory
I1017 11:39:37.551648       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:38.551632       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:39.551410       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:40.552620       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:41.552050       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:42.551076       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:42.564293       1 template.go:828] router "msg"="Shutdown requested, waiting 45s for new connections to cease" 

melvinjoseph@mjoseph-mac openshift-tests-private % oc -n openshift-ingress-operator get ingresscontroller 
NAME      AGE
default   3h59m
melvinjoseph@mjoseph-mac openshift-tests-private % oc -n openshift-ingress-operator get ingresscontroller default -o yaml
apiVersion: operator.openshift.io/v1
<-----snip---->
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2023-10-17T07:41:42Z"
    reason: Valid
    status: "True"
    type: Admitted
  - lastTransitionTime: "2023-10-17T07:57:01Z"
    message: The deployment has Available status condition set to True
    reason: DeploymentAvailable
    status: "True"
    type: DeploymentAvailable
  - lastTransitionTime: "2023-10-17T07:57:01Z"
    message: Minimum replicas requirement is met
    reason: DeploymentMinimumReplicasMet
    status: "True"
    type: DeploymentReplicasMinAvailable
  - lastTransitionTime: "2023-10-17T11:34:54Z"
    message: 1/2 of replicas are available
    reason: DeploymentReplicasNotAvailable
    status: "False"
    type: DeploymentReplicasAllAvailable
  - lastTransitionTime: "2023-10-17T11:34:54Z"
    message: |
      Waiting for router deployment rollout to finish: 1 old replica(s) are pending termination...
    reason: DeploymentRollingOut
    status: "True"
    type: DeploymentRollingOut
  - lastTransitionTime: "2023-10-17T07:41:43Z"
    message: The endpoint publishing strategy supports a managed load balancer
    reason: WantedByEndpointPublishingStrategy
    status: "True"
    type: LoadBalancerManaged
  - lastTransitionTime: "2023-10-17T07:57:24Z"
    message: The LoadBalancer service is provisioned
    reason: LoadBalancerProvisioned
    status: "True"
    type: LoadBalancerReady
  - lastTransitionTime: "2023-10-17T07:41:43Z"
    message: LoadBalancer is not progressing
    reason: LoadBalancerNotProgressing
    status: "False"
    type: LoadBalancerProgressing
  - lastTransitionTime: "2023-10-17T07:41:43Z"
    message: DNS management is supported and zones are specified in the cluster DNS
      config.
    reason: Normal
    status: "True"
    type: DNSManaged
  - lastTransitionTime: "2023-10-17T07:57:26Z"
    message: The record is provisioned in all reported zones.
    reason: NoFailedZones
    status: "True"
    type: DNSReady
  - lastTransitionTime: "2023-10-17T07:57:26Z"
    status: "True"
    type: Available
  - lastTransitionTime: "2023-10-17T11:34:54Z"
    message: |-
      One or more status conditions indicate progressing: DeploymentRollingOut=True (DeploymentRollingOut: Waiting for router deployment rollout to finish: 1 old replica(s) are pending termination...
      )
    reason: IngressControllerProgressing
    status: "True"
    type: Progressing
  - lastTransitionTime: "2023-10-17T07:57:28Z"
    status: "False"
    type: Degraded
  - lastTransitionTime: "2023-10-17T07:41:43Z"
<-----snip---->

Bug OCPBUGS-22166: network-tools throwing errors on --help

View the Description View the linked PRs

Description of problem:

network-tools -h
error: You must be logged in to the server (Unauthorized)
error: You must be logged in to the server (Unauthorized)
Usage: network-tools [command]

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/network-tools/pull/93

Bug MGMT-16258: When preparing for skipping reboot if installation disk is nvme the partition names are incorrect

View the Description View the linked PRs

Description of the problem:
When preparing for skipping reboot, the partition names are generated by appending "4" and "3" to the installation disk. This is not always correct. For nvme we should append "p4", and "p3"

How reproducible:

Always with nvme

Steps to reproduce:

1. Try install with nvme installation disk

Actual results:

The reboot is not skipped

Expected results:
The reboot should be skipped

https://github.com/openshift/assisted-installer/pull/752

Bug OCPBUGS-17408: The InstallPlan has two duplicate items in the clusterServiceVersionNames array, which causes duplicate items to displayed on multiple pages in the console.

View the Description View the linked PRs

Description of problem:

An operator installPlan has duplicate key values for installPlan?.spec.clusterServiceVersionNames which is displayed in multiple pages in the management console.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-31-181848

How reproducible:

Always

Expected results:

In the screenshots linked below the clusterServiceVersionNames value should only display one item, but because their are duplicate key values it lists it twice.

Additional info:

This bug causes duplicate values to be shown in several pages of the Management Console. screenshots
https://drive.google.com/file/d/1OwiLXU8iETNusCf6N2AhB5y-ykXwgyBU/view?usp=drive_link

https://drive.google.com/file/d/1qfMso1x-s--samU7OmDKU-3NVfxqsxWD/view?usp=drive_link

https://drive.google.com/file/d/1Z9mGRllp4ZLN2OlSNKZY2QTIDx8QpyVS/view?usp=drive_link

https://drive.google.com/file/d/1CYWMpKy_KmUV_KfIxCjS1FAWHYbYA6rw/view?usp=drive_link

https://github.com/openshift/operator-framework-olm/pull/596

Bug OCPBUGS-19219: Update 4.15 ose-agent-installer-utils image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/agent-installer-utils/pull/29

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/agent-installer-utils/pull/29

Bug OCPBUGS-2117: [gcp] pre-emptible VM: machine-api-termination-handler not marking instance for deletion

View the Description View the linked PRs

Description of problem:

GCP preemptible VM termination is not being handled correctly by machine-api-termination-handler.

Version-Release number of selected component (if applicable):

Tested on both 4.10.22 and 4.11.2

How reproducible:

To reproduce the issue:

Create spot instance machine in gcp. Stop instance, notice in machine-api-termination-handler pod there is no signal in there signifying it was terminated. Note we do see on machines list the TERMINATED status. Result is that pods are not gracefully moved off in the 90sec window before node is turned off.

We would expect a terminated node to wait for pods to move off (up to 90sec) and then shutdown, instead of an immediate shutdown of the node.

Steps to Reproduce:

1. Create spot instance machine in gcp. 
2. Stop instance
3. Notice in machine-api-termination-handler pod there is no signal in there signifying it was terminated.
4. Note we do see on machines list the TERMINATED status. 
5. Result is that pods are not gracefully moved off in the 90sec window before node is turned off.

Actual results:

The machine-api-termination-handler logs don't show any message such as "Instance marked for termination, marking Node for deletion" but instead no signal is received from GCP.

Expected results:

A terminated node should wait for pods to move off (up to 90sec) and then shutdown, instead of an immediate shutdown of the node.

Additional info:
Here is the code:
https://github.com/openshift/machine-api-provider-gcp/blob/main/pkg/termination/termination.go#L96-L127

#forum-cloud slack thread:
https://coreos.slack.com/archives/CBZHF4DHC/p1656524730323259

#forum-node slack thread:
https://coreos.slack.com/archives/CK1AE4ZCK/p1656619821630479

https://github.com/openshift/machine-api-provider-gcp/pull/71

Bug OCPBUGS-22003: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-workload-identity/pull/13

Bug OCPBUGS-23120: [IBM ROKS] cluster-storage-operator does not set upgradeable=True

View the Description View the linked PRs

Description of problem:

There is a problem with IBM ROKS (managed service) running 4.14+

cluster-storage-operator never sets the upgradeable=True condition, so it shows up as Unknown:

  - lastTransitionTime: "2023-11-08T19:07:01Z"
    reason: NoData
    status: Unknown
    type: Upgradeable

This is a regression from 4.13.

In 4.13, pkg/operator/snapshotcrd/controller.go was the one that set `upgradeable: True`

    upgradeable := operatorapi.OperatorCondition{
        Type:   conditionsPrefix + operatorapi.OperatorStatusTypeUpgradeable,
        Status: operatorapi.ConditionTrue,
    }

In the 4.13 bundle from IBM ROKS, these two conditions are set in cluster-scoped-resources/operator.openshift.io/storages/cluster.yaml

  - lastTransitionTime: "2023-11-08T14:22:21Z"
    status: "True"
    type: SnapshotCRDControllerUpgradeable
  - lastTransitionTime: "2023-11-08T14:22:21Z"
    reason: AsExpected
    status: "False"
    type: SnapshotCRDControllerDegraded

So the SnapshotCRDController is running and sets `upgradeable: True` on 4.13.

But in the 4.14 bundle, SnapshotCRDController no longer exists.

https://github.com/openshift/cluster-storage-operator/pull/385/commits/fa9af3aad65b9d0e9c618453825e4defeaad59ac

So in 4.14+ it's pkg/operator/defaultstorageclass/controller.go that should set the condition

https://github.com/openshift/cluster-storage-operator/blob/dbb1514dbf9923c56a4a198374cc59e45f9bc0cc/pkg/operator/defaultstorageclass/controller.go#L97-L100

But that only happens if `syncErr == unsupportedPlatformError`...
and not if `if syncErr == supportedByCSIError` like the case with the IBM VPC driver.

  - lastTransitionTime: "2023-11-08T14:22:23Z"
    message: 'DefaultStorageClassControllerAvailable: StorageClass provided by supplied
      CSI Driver instead of the cluster-storage-operator'
    reason: AsExpected
    status: "True"
    type: Available

So what controller will set `upgradeable: True` for IBM VPC?
IBM VPC uses this StatusFilter function for ROKS:

https://github.com/openshift/cluster-storage-operator/blob/dbb1514dbf9923c56a4a198374cc59e45f9bc0cc/pkg/operator/csidriveroperator/csioperatorclient/ibm-vpc-block.go#L17-L27

ROKS and AzureStack are the only deployments using a StatusFilter function...
So shouldRunController returns false here because the platform is ROKS:

https://github.com/openshift/cluster-storage-operator/blob/dbb1514dbf9923c56a4a198374cc59e45f9bc0cc/pkg/operator/csidriveroperator/driver_starter.go#L347-L349

Which means there is no controller to set `upgradeable: True`

Version-Release number of selected component (if applicable):

4.14.0+

How reproducible:

Always

Steps to Reproduce:

1. Install 4.14 via IBM ROKS
2. Check status conditions in cluster-scoped-resources/config.openshift.io/clusteroperators/storage.yaml

Actual results:

upgradeable=Unknown

Expected results:

upgradeable=True

Additional info:

4.13 IBM ROKS must-gather:
https://github.com/Joseph-Goergen/ibm-roks-toolkit/releases/download/test/must-gather-4.13.tar.gz

4.14 IBM ROKS must-gather: 
https://github.com/Joseph-Goergen/ibm-roks-toolkit/releases/download/test/must-gather.tar.gz

https://github.com/openshift/cluster-storage-operator/pull/417

Bug OCPBUGS-22497: Inline Dockerbuild type doesn't preserve file modified timestamp

View the Description View the linked PRs

While trying to develop a demo for a Java application, that first builds using the source-to-image strategy and then uses the resulting image to copy artefacts from the s2i-builder+compiled sources-image to a slimmer runtime image using an inline Dockerfile build strategy on OpenShift, the deployment then fails since the inline Dockerfile hooks doesn't preserve the modification time of the file that gets copied. This is different to how 'docker' itself does it with a multi-stage build.

Version-Release number of selected component (if applicable):

4.12.14

How reproducible:

Always

Steps to Reproduce:

1. git clone https://github.com/jerboaa/quarkus-quickstarts
2. cd quarkus-quickstarts && git checkout ocp-bug-inline-docker
3. oc new-project quarkus-appcds-nok
4. oc process -f rest-json-quickstart/openshift/quarkus_runtime_appcds_template.yaml | oc create -f -

Actual results:

$ oc logs quarkus-rest-json-appcds-4-xc47z
INFO exec -a "java" java -XX:MaxRAMPercentage=80.0 -XX:+UseParallelGC -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=20 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -XX:+ExitOnOutOfMemoryError -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -Xshare:on -XX:SharedArchiveFile=/deployments/app-cds.jsa -Dquarkus.http.host=0.0.0.0 -cp "." -jar /deployments/rest-json-quickstart-1.0.0-SNAPSHOT-runner.jar 
INFO running in /deployments
Error occurred during initialization of VM
Unable to use shared archive.
An error has occurred while processing the shared archive file.
A jar file is not the one used while building the shared archive file: rest-json-quickstart-1.0.0-SNAPSHOT-runner.jar

Expected results:

Starting the Java application using /opt/jboss/container/java/run/run-java.sh ...
INFO exec -a "java" java -XX:MaxRAMPercentage=80.0 -XX:+UseParallelGC -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=20 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -XX:+ExitOnOutOfMemoryError -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -Xshare:on -XX:SharedArchiveFile=/deployments/app-cds.jsa -Dquarkus.http.host=0.0.0.0 -cp "." -jar /deployments/rest-json-quickstart-1.0.0-SNAPSHOT-runner.jar 
INFO running in /deployments
__  ____  __  _____   ___  __ ____  ______ 
 --/ __ \/ / / / _ | / _ \/ //_/ / / / __/ 
 -/ /_/ / /_/ / __ |/ , _/ ,< / /_/ /\ \   
--\___\_\____/_/ |_/_/|_/_/|_|\____/___/   
2023-10-27 18:13:01,866 INFO  [io.quarkus] (main) rest-json-quickstart 1.0.0-SNAPSHOT on JVM (powered by Quarkus 3.4.3) started in 0.966s. Listening on: http://0.0.0.0:8080
2023-10-27 18:13:01,867 INFO  [io.quarkus] (main) Profile prod activated. 
2023-10-27 18:13:01,867 INFO  [io.quarkus] (main) Installed features: [cdi, resteasy-reactive, resteasy-reactive-jackson, smallrye-context-propagation, vertx]

Additional info:

When deploying with AppCDS turned on, then we can get the pods to start and when we then look at the modified file time of the offending file we notice that these differ from the original s2i-merge-image (A) and the runtime image (B):

(A)
$ oc rsh quarkus-rest-json-appcds-s2i-1-x5hct stat /deployments/rest-json-quickstart-1.0.0-SNAPSHOT-runner.jar
  File: /deployments/rest-json-quickstart-1.0.0-SNAPSHOT-runner.jar
  Size: 16057039  	Blocks: 31368      IO Block: 4096   regular file
Device: 200001h/2097153d	Inode: 60146490    Links: 1
Access: (0664/-rw-rw-r--)  Uid: (  185/ default)   Gid: (    0/    root)
Access: 2023-10-27 18:11:22.000000000 +0000
Modify: 2023-10-27 18:11:22.000000000 +0000
Change: 2023-10-27 18:11:41.555586774 +0000
 Birth: 2023-10-27 18:11:41.491586774 +0000

(B)
$ oc rsh quarkus-rest-json-appcds-1-l7xw2 stat /deployments/rest-json-quickstart-1.0.0-SNAPSHOT-runner.jar
  File: /deployments/rest-json-quickstart-1.0.0-SNAPSHOT-runner.jar
  Size: 16057039  	Blocks: 31368      IO Block: 4096   regular file
Device: 2000a3h/2097315d	Inode: 71601163    Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2023-10-27 18:11:44.000000000 +0000
Modify: 2023-10-27 18:11:44.000000000 +0000
Change: 2023-10-27 18:12:12.169087346 +0000
 Birth: 2023-10-27 18:12:12.114087346 +0000

Both should have 'Modify: 2023-10-27 18:11:22.000000000 +0000'.

When I perform a local s2i build of the same application sources and then use this multi-stage Dockerfile, the modify time of the files remain the same.

FROM quarkus-app-uberjar:ubi9 as s2iimg

FROM registry.access.redhat.com/ubi9/openjdk-17-runtime as final
COPY --from=s2iimg /deployments/* /deployments/
ENV JAVA_OPTS_APPEND="-XX:+UseCompressedClassPointers -XX:+UseCompressedOops -Xshare:on -XX:SharedArchiveFile=app-cds.jsa"

as shown here:

$ sudo docker run --rm -ti --entrypoint /bin/bash quarkus-app-uberjar:ubi9 -c 'stat /deployments/rest-json-quickstart-1.0.0-SNAPSHOT-runner.jar'
  File: /deployments/rest-json-quickstart-1.0.0-SNAPSHOT-runner.jar
  Size: 16057020  	Blocks: 31368      IO Block: 4096   regular file
Device: 6fh/111d	Inode: 276781319   Links: 1
Access: (0664/-rw-rw-r--)  Uid: (  185/ default)   Gid: (    0/    root)
Access: 2023-10-27 15:52:28.000000000 +0000
Modify: 2023-10-27 15:52:28.000000000 +0000
Change: 2023-10-27 15:52:37.352926632 +0000
 Birth: 2023-10-27 15:52:37.288926109 +0000
$ sudo docker run --rm -ti --entrypoint /bin/bash quarkus-cds-app -c 'stat /deployments/rest-json-quickstart-1.0.0-SNAPSHOT-runner.jar'
  File: /deployments/rest-json-quickstart-1.0.0-SNAPSHOT-runner.jar
  Size: 16057020  	Blocks: 31368      IO Block: 4096   regular file
Device: 6fh/111d	Inode: 14916403    Links: 1
Access: (0664/-rw-rw-r--)  Uid: (  185/ default)   Gid: (    0/    root)
Access: 2023-10-27 15:52:28.000000000 +0000
Modify: 2023-10-27 15:52:28.000000000 +0000
Change: 2023-10-27 15:53:04.408147760 +0000
 Birth: 2023-10-27 15:53:04.346147253 +0000

Both have a modified file time of 2023-10-27 15:52:28.000000000 +0000

https://github.com/openshift/builder/pull/369

Bug OCPBUGS-24068: Update 4.15 ose-cluster-olm-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-olm-operator/pull/35

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-olm-operator/pull/35

Bug OCPBUGS-19293: Update 4.15 openshift-enterprise-tests image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/origin/pull/28264

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/origin/pull/28264

Bug OCPBUGS-20295: The error message "The operator does not support single namespace or global installation modes." is confusing

View the Description View the linked PRs

Description of problem:

Cannot install singlenamespace operator using web console

Version-Release number of selected component (if applicable):

zhaoxia@xzha-mac doc_add_operator % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.nightly-2023-10-08-220853   True        False         168m    Cluster version is 4.14.0-0.nightly-2023-10-08-220853

How reproducible:

always

Steps to Reproduce:

1.install catsrc
zhaoxia@xzha-mac doc_add_operator % cat catsrc-singlenamespace.yaml 
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: nginx-index
  namespace: openshift-marketplace
spec:
  displayName: Test
  publisher: OLM-QE
  sourceType: grpc
  image: quay.io/olmqe/nginxolm-operator-index:v1-singlenamespace
  updateStrategy:
    registryPoll:
      interval: 10m
 oc apply -f catsrc-singlenamespace.yaml

zhaoxia@xzha-mac doc_add_operator % oc get packagemanifests nginx-operator -o yaml
      installModes:
      - supported: false
        type: OwnNamespace
      - supported: true
        type: SingleNamespace
      - supported: false
        type: MultiNamespace
      - supported: false
        type: AllNamespaces

2. install nginx-operator using web console

3.

Actual results:

nginxolm can't be installed with error message:

"nginxolm can't be installed
The operator does not support single namespace or global installation modes."

The error message confused me, nginx-operator does support SingleNamespace, but the error message said "The operator does not support single namespace or global installation modes."

Expected results:

nginxolm can be installed

Additional info:

The error message confused me, nginx-operator does support SingleNamespace, but the error message said "The operator does not support single namespace or global installation modes."

https://github.com/openshift/console/pull/13232

Bug OCPBUGS-21794: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-baremetal/pull/197

Bug OCPBUGS-22655: Bump FCOS image to latest stable

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7644

Bug OCPBUGS-19257: Update 4.15 operator-lifecycle-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/operator-framework-olm/pull/564

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/operator-framework-olm/pull/564

Bug OCPBUGS-19291: Update 4.15 ose-csi-driver-shared-resource-mustgather image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-driver-shared-resource/pull/144

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-driver-shared-resource/pull/144

Bug OCPBUGS-19225: Update 4.15 csi-attacher image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-attacher/pull/57

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-attacher/pull/57

Bug OCPBUGS-19868: Avoid panicking on all-fresh-cache evaluation

View the Description View the linked PRs

Description of problem:

The cluster-version operator should not crash while trying to evaluate a bogus condition.

Version-Release number of selected component (if applicable):

4.10 and later are exposed to the bug. It's possible that the ~~OCPBUGS-19512~~ series increases exposure.

How reproducible:

Unclear.

Steps to Reproduce:

1. Create a cluster.
2. Point it at https://raw.githubusercontent.com/shellyyang1989/upgrade-cincy/master/cincy-conditional-edge.json (you may need to adjust version strings and digests for your test-cluster's release).
3. Wait around 30 minutes.
4. Point it at https://raw.githubusercontent.com/shellyyang1989/upgrade-cincy/master/cincy-conditional-edge-invalid-promql.json (again, may need some customization).

Actual results:

$ grep -B1 -A15 'too fresh' previous.log
I0927 12:07:55.594222       1 cincinnati.go:114] Using a root CA pool with 0 root CA subjects to request updates from https://raw.githubusercontent.com/shellyyang1989/upgrade-cincy/master/cincy-conditional-edge-invalid-promql.json?arch=amd64&channel=stable-4.15&id=dc628f75-7778-457a-bb69-6a31a243c3a9&version=4.15.0-0.test-2023-09-27-091926-ci-ln-01zw7kk-latest
I0927 12:07:55.726463       1 cache.go:118] {"type":"PromQL","promql":{"promql":"0 * group(cluster_version)"}} is the most stale cached cluster-condition match entry, but it is too fresh (last evaluated on 2023-09-27 11:37:25.876804482 +0000 UTC m=+175.082381015).  However, we don't have a cached evaluation for {"type":"PromQL","promql":{"promql":"group(cluster_version_available_updates{channel=buggy})"}}, so attempt to evaluate that now.
I0927 12:07:55.726602       1 cache.go:129] {"type":"PromQL","promql":{"promql":"0 * group(cluster_version)"}} is stealing this cluster-condition match call for {"type":"PromQL","promql":{"promql":"group(cluster_version_available_updates{channel=buggy})"}}, because its last evaluation completed 30m29.849594461s ago
I0927 12:07:55.758573       1 cvo.go:703] Finished syncing available updates "openshift-cluster-version/version" (170.074319ms)
E0927 12:07:55.758847       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 194 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x1c4df00?, 0x32abc60})
        /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc001489d40?})
        /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x75
panic({0x1c4df00, 0x32abc60})
        /usr/lib/golang/src/runtime/panic.go:884 +0x213
github.com/openshift/cluster-version-operator/pkg/clusterconditions/promql.(*PromQL).Match(0xc0004860e0, {0x220ded8, 0xc00041e550}, 0x0)
        /go/src/github.com/openshift/cluster-version-operator/pkg/clusterconditions/promql/promql.go:134 +0x419
github.com/openshift/cluster-version-operator/pkg/clusterconditions/cache.(*Cache).Match(0xc0002d3ae0, {0x220ded8, 0xc00041e550}, 0xc0033948d0)
        /go/src/github.com/openshift/cluster-version-operator/pkg/clusterconditions/cache/cache.go:132 +0x982
github.com/openshift/cluster-version-operator/pkg/clusterconditions.(*conditionRegistry).Match(0xc000016760, {0x220ded8, 0xc00041e550}, {0xc0033948a0, 0x1, 0x0?})

Expected results:

No panics.

Additional info:

I'm still not entirely clear on how ~~OCPBUGS-19512~~ would have increased exposure.

https://github.com/openshift/cluster-version-operator/pull/975

Bug OCPBUGS-25726: Dev console: Pipelines integration tests was disabled because the operator wasn't available on 4.15

View the Description View the linked PRs

This is a clone of issue OCPBUGS-25206. The following is the description of the original issue:
—
We need to reenable the e2e integration tests as soon as the operator is available again.

https://github.com/openshift/console/pull/13463

Bug OCPBUGS-10906: machine-os-images rhcos version not in sync with installer metadata

View the Description View the linked PRs

A case was found recently (see https://github.com/openshift/machine-os-images/pull/27) where the rhcos image version stored within the machine-os-images was different than the one reported in the installer rhcos metadata.
This sync is particular relevant for the agent-based installer, since the create image command logic could fetch the base ISO either from the machine-os-images content either from a direct download, depending on the availability or not of the oc command in the current execution environment.

Even though this scenario is very unlikely to happen in production, a missing sync between the machine-os-images and the installer metadata may produce different results depending on the environmental condition, and moreover can hide silently severe issues.

https://github.com/openshift/installer/pull/7030

Bug OCPBUGS-19204: Update 4.15 prometheus-operator-admission-webhook image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prometheus-operator/pull/244

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/prometheus-operator/pull/244

Bug OCPBUGS-19365: Azure cluster installation failed with sdn plugin

View the Description View the linked PRs

Description of problem:

Azure cluster installation failed with sdn network plugin

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-09-17-045811
4.13.0-0.nightly-2023-09-18-210322

How reproducible:

Sometimes, found 2 failed in 5 jobs in ci

Steps to Reproduce:

1.  Install azure cluster with template aos-4_15/ipi-on-azure/versioned-installer-customer_vpc

Actual results:

Installation failed 
 09-19 10:56:47.536  level=info msg=Cluster operator node-tuning Progressing is True with Reconciling: Working towards "4.15.0-0.nightly-2023-09-17-045811"
09-19 10:56:47.536  level=info msg=Cluster operator openshift-apiserver Progressing is True with APIServerDeployment_PodsUpdating: APIServerDeploymentProgressing: deployment/apiserver.openshift-apiserver: 1/3 pods have been updated to the latest generation
09-19 10:56:47.536  level=info msg=Cluster operator openshift-controller-manager Progressing is True with _DesiredStateNotYetAchieved: Progressing: deployment/controller-manager: updated replicas is 1, desired replicas is 3
09-19 10:56:47.536  level=info msg=Progressing: deployment/route-controller-manager: updated replicas is 1, desired replicas is 3
09-19 10:56:47.536  level=info msg=Cluster operator storage Progressing is True with AzureDiskCSIDriverOperatorCR_AzureDiskDriverNodeServiceController_Deploying::AzureFileCSIDriverOperatorCR_AzureFileDriverNodeServiceController_Deploying: AzureDiskCSIDriverOperatorCRProgressing: AzureDiskDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods
09-19 10:56:47.536  level=info msg=AzureFileCSIDriverOperatorCRProgressing: AzureFileDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods
09-19 10:56:47.536  level=error msg=Cluster initialization failed because one or more operators are not functioning properly.
09-19 10:56:47.536  level=error msg=The cluster should be accessible for troubleshooting as detailed in the documentation linked below,
09-19 10:56:47.537  level=error msg=https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html
09-19 10:56:47.537  level=error msg=The 'wait-for install-complete' subcommand can then be used to continue the installation
09-19 10:56:47.537  level=error msg=failed to initialize the cluster: Cluster operators authentication, console, control-plane-machine-set, kube-apiserver, machine-config are not available
09-19 10:56:47.537  [[1;31mERROR[0;39m] Installation failed with error code '6'. Aborting execution.

oc get nodes
NAME                                           STATUS     ROLES                  AGE     VERSION
jima41501-c646k-master-0                       NotReady   control-plane,master   3h35m   v1.28.2+fde2a12
jima41501-c646k-master-1                       Ready      control-plane,master   3h35m   v1.28.2+fde2a12
jima41501-c646k-master-2                       Ready      control-plane,master   3h35m   v1.28.2+fde2a12
jima41501-c646k-worker-southcentralus1-x82cb   Ready      worker                 3h22m   v1.28.2+fde2a12
jima41501-c646k-worker-southcentralus2-jxbbt   Ready      worker                 3h19m   v1.28.2+fde2a12
jima41501-c646k-worker-southcentralus3-s4j6c   Ready      worker                 3h18m   v1.28.2+fde2a12
huirwang@huirwang-mac workspace % oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.15.0-0.nightly-2023-09-17-045811   False       True          True       3h31m   WellKnownAvailable: The well-known endpoint is not yet available: kube-apiserver oauth endpoint https://10.0.0.7:6443/.well-known/oauth-authorization-server is not yet served and authentication operator keeps waiting (check kube-apiserver operator, and check that instances roll out successfully, which can take several minutes per instance)
baremetal                                  4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h30m   
cloud-controller-manager                   4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h34m   
cloud-credential                           4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h39m   
cluster-autoscaler                         4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h30m   
config-operator                            4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h31m   
console                                    4.15.0-0.nightly-2023-09-17-045811   False       True          False      3h20m   DeploymentAvailable: 0 replicas available for console deployment...
control-plane-machine-set                  4.15.0-0.nightly-2023-09-17-045811   False       True          False      3h24m   Missing 1 available replica(s)
csi-snapshot-controller                    4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h30m   
dns                                        4.15.0-0.nightly-2023-09-17-045811   True        True          False      3h30m   DNS "default" reports Progressing=True: "Have 5 available node-resolver pods, want 6."
etcd                                       4.15.0-0.nightly-2023-09-17-045811   True        True          True       3h29m   NodeControllerDegraded: The master nodes not ready: node "jima41501-c646k-master-0" not ready since 2023-09-19 02:13:06 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
image-registry                             4.15.0-0.nightly-2023-09-17-045811   True        True          False      3h19m   Progressing: The registry is ready...
ingress                                    4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h19m   
insights                                   4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h19m   
kube-apiserver                             4.15.0-0.nightly-2023-09-17-045811   False       True          True       3h31m   StaticPodsAvailable: 0 nodes are active; 3 nodes are at revision 0; 0 nodes have achieved new revision 8
kube-controller-manager                    4.15.0-0.nightly-2023-09-17-045811   True        True          True       3h27m   NodeControllerDegraded: The master nodes not ready: node "jima41501-c646k-master-0" not ready since 2023-09-19 02:13:06 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
kube-scheduler                             4.15.0-0.nightly-2023-09-17-045811   True        True          True       3h27m   NodeControllerDegraded: The master nodes not ready: node "jima41501-c646k-master-0" not ready since 2023-09-19 02:13:06 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
kube-storage-version-migrator              4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h30m   
machine-api                                4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h17m   
machine-approver                           4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h30m   
machine-config                             4.15.0-0.nightly-2023-09-17-045811   False       False         True       164m    Cluster not available for [{operator 4.15.0-0.nightly-2023-09-17-045811}]: failed to apply machine config daemon manifests: error during waitForDaemonsetRollout: [context deadline exceeded, daemonset machine-config-daemon is not ready. status: (desired: 6, updated: 6, ready: 5, unavailable: 1)]
marketplace                                4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h30m   
monitoring                                 4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h15m   
network                                    4.15.0-0.nightly-2023-09-17-045811   True        True          False      3h31m   DaemonSet "/openshift-multus/network-metrics-daemon" is not available (awaiting 1 nodes)...
node-tuning                                4.15.0-0.nightly-2023-09-17-045811   True        True          False      3h30m   Working towards "4.15.0-0.nightly-2023-09-17-045811"
openshift-apiserver                        4.15.0-0.nightly-2023-09-17-045811   True        True          True       3h24m   APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-apiserver ()
openshift-controller-manager               4.15.0-0.nightly-2023-09-17-045811   True        True          False      3h27m   Progressing: deployment/controller-manager: updated replicas is 1, desired replicas is 3...
openshift-samples                          4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h23m   
operator-lifecycle-manager                 4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h30m   
operator-lifecycle-manager-catalog         4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h30m   
operator-lifecycle-manager-packageserver   4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h25m   
service-ca                                 4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h31m   
storage                                    4.15.0-0.nightly-2023-09-17-045811   True        True          False      3h30m   AzureDiskCSIDriverOperatorCRProgressing: AzureDiskDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods...

[systemd]
Failed Units: 1
  openshift-azure-routes.service
[core@jima41501-c646k-master-0 ~]$ sudo -i
[systemd]
Failed Units: 1
  openshift-azure-routes.service
[root@jima41501-c646k-master-0 ~]# systemctl status openshift-azure-routes.service
× openshift-azure-routes.service - Work around Azure load balancer hairpin
     Loaded: loaded (/etc/systemd/system/openshift-azure-routes.service; static)
     Active: failed (Result: exit-code) since Tue 2023-09-19 02:10:31 UTC; 3h 23min ago
   Duration: 55ms
TriggeredBy: ● openshift-azure-routes.path
    Process: 13908 ExecStart=/bin/bash /opt/libexec/openshift-azure-routes.sh start (code=exited, status=1/FAILURE)
   Main PID: 13908 (code=exited, status=1/FAILURE)
        CPU: 77ms

Sep 19 02:10:31 jima41501-c646k-master-0 systemd[1]: Started Work around Azure load balancer hairpin.
Sep 19 02:10:31 jima41501-c646k-master-0 openshift-azure-routes[13908]: processing v4 vip 10.0.0.4
Sep 19 02:10:31 jima41501-c646k-master-0 openshift-azure-routes[13908]: /opt/libexec/openshift-azure-routes.sh: line 130: ovnkContaine>
Sep 19 02:10:31 jima41501-c646k-master-0 systemd[1]: openshift-azure-routes.service: Main process exited, code=exited, status=1/FAILURE
Sep 19 02:10:31 jima41501-c646k-master-0 systemd[1]: openshift-azure-routes.service: Failed with result 'exit-code'.


4.13 failed in ci
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.13-e2e-azure-sdn/1703878138968150016/artifacts/e2e-azure-sdn/gather-extra/artifacts/oc_cmds/clusteroperators
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.13.0-0.nightly-2023-09-18-210322   False       True          True       55m     WellKnownAvailable: The well-known endpoint is not yet available: kube-apiserver oauth endpoint https://10.0.0.6:6443/.well-known/oauth-authorization-server is not yet served and authentication operator keeps waiting (check kube-apiserver operator, and check that instances roll out successfully, which can take several minutes per instance)
baremetal                                  4.13.0-0.nightly-2023-09-18-210322   True        False         False      54m     
cloud-controller-manager                   4.13.0-0.nightly-2023-09-18-210322   True        False         False      56m     
cloud-credential                           4.13.0-0.nightly-2023-09-18-210322   True        False         False      58m     
cluster-autoscaler                         4.13.0-0.nightly-2023-09-18-210322   True        False         False      53m     
config-operator                            4.13.0-0.nightly-2023-09-18-210322   True        False         False      55m     
console                                    4.13.0-0.nightly-2023-09-18-210322   False       True          False      45m     DeploymentAvailable: 0 replicas available for console deployment...
control-plane-machine-set                  4.13.0-0.nightly-2023-09-18-210322   False       True          False      47m     Missing 1 available replica(s)
csi-snapshot-controller                    4.13.0-0.nightly-2023-09-18-210322   True        False         False      54m     
dns                                        4.13.0-0.nightly-2023-09-18-210322   True        True          False      53m     DNS "default" reports Progressing=True: "Have 5 available node-resolver pods, want 6."
etcd                                       4.13.0-0.nightly-2023-09-18-210322   True        True          True       52m     NodeControllerDegraded: The master nodes not ready: node "ci-op-pjxb081y-0c3e0-bxvlr-master-0" not ready since 2023-09-18 21:40:51 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
image-registry                             4.13.0-0.nightly-2023-09-18-210322   True        True          False      45m     NodeCADaemonProgressing: The daemon set node-ca is deploying node pods...
ingress                                    4.13.0-0.nightly-2023-09-18-210322   True        False         False      44m     
insights                                   4.13.0-0.nightly-2023-09-18-210322   True        False         False      47m     
kube-apiserver                             4.13.0-0.nightly-2023-09-18-210322   False       True          True       53m     StaticPodsAvailable: 0 nodes are active; 3 nodes are at revision 0; 0 nodes have achieved new revision 10
kube-controller-manager                    4.13.0-0.nightly-2023-09-18-210322   True        True          True       51m     NodeControllerDegraded: The master nodes not ready: node "ci-op-pjxb081y-0c3e0-bxvlr-master-0" not ready since 2023-09-18 21:40:51 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
kube-scheduler                             4.13.0-0.nightly-2023-09-18-210322   True        True          True       51m     NodeControllerDegraded: The master nodes not ready: node "ci-op-pjxb081y-0c3e0-bxvlr-master-0" not ready since 2023-09-18 21:40:51 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
kube-storage-version-migrator              4.13.0-0.nightly-2023-09-18-210322   True        False         False      54m     
machine-api                                4.13.0-0.nightly-2023-09-18-210322   True        False         False      46m     
machine-approver                           4.13.0-0.nightly-2023-09-18-210322   True        False         False      54m     
machine-config                             4.13.0-0.nightly-2023-09-18-210322   False       False         True       31m     Cluster not available for [{operator 4.13.0-0.nightly-2023-09-18-210322}]: failed to apply machine config daemon manifests: error during waitForDaemonsetRollout: [timed out waiting for the condition, daemonset machine-config-daemon is not ready. status: (desired: 6, updated: 6, ready: 5, unavailable: 1)]
marketplace                                4.13.0-0.nightly-2023-09-18-210322   True        False         False      53m     
monitoring                                 4.13.0-0.nightly-2023-09-18-210322   True        False         False      43m     
network                                    4.13.0-0.nightly-2023-09-18-210322   True        True          False      55m     DaemonSet "/openshift-multus/network-metrics-daemon" is not available (awaiting 1 nodes)...
node-tuning                                4.13.0-0.nightly-2023-09-18-210322   True        True          False      53m     Working towards "4.13.0-0.nightly-2023-09-18-210322"
openshift-apiserver                        4.13.0-0.nightly-2023-09-18-210322   True        True          True       44m     APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-apiserver (3 containers are waiting in pending apiserver-66d764fbd6-r2s8d pod)
openshift-controller-manager               4.13.0-0.nightly-2023-09-18-210322   True        True          False      54m     Progressing: deployment/controller-manager: updated replicas is 1, desired replicas is 3...
openshift-samples                          4.13.0-0.nightly-2023-09-18-210322   True        False         False      47m     
operator-lifecycle-manager                 4.13.0-0.nightly-2023-09-18-210322   True        False         False      54m     
operator-lifecycle-manager-catalog         4.13.0-0.nightly-2023-09-18-210322   True        False         False      54m     
operator-lifecycle-manager-packageserver   4.13.0-0.nightly-2023-09-18-210322   True        False         False      48m     
service-ca                                 4.13.0-0.nightly-2023-09-18-210322   True        False         False      55m     
storage                                    4.13.0-0.nightly-2023-09-18-210322   True        True          False      54m     AzureDiskCSIDriverOperatorCRProgressing: AzureDiskDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods...

Expected results:


Installation succeeds

Additional info:

We doubted this is caused by PR https://github.com/openshift/machine-config-operator/pull/3878/files

https://github.com/openshift/machine-config-operator/pull/3926

Bug OCPBUGS-23305: Install should skip validate if apivip/ingressvip in different subnet with machine networks for ELB

View the Description View the linked PRs

Description of problem:

our ELB which is 10.1.235.128, however the machine host default network in another subnet. 192.168. then installation will be break with 
"
platform.baremetal.apiVIPs: Invalid value: "10.1.235.128": IP expected to be in one of the machine networks: 192.168.90.0/24, platform.baremetal.ingressVIPs: Invalid value: "10.1.235.128": IP expected to be in one of the machine networks: 192.168.90.0/24"

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. setup cluster with loadbalncer is "usermanaged" type and apivip/ingressvip in different subnet with machine network CIDR
2.
3.

Actual results:

platform.baremetal.apiVIPs: Invalid value: "10.1.235.128": IP expected to be in one of the machine networks: 192.168.90.0/24, platform.baremetal.ingressVIPs: Invalid value: "10.1.235.128": IP expected to be in one of the machine networks: 192.168.90.0/24

Expected results:

for ELB, apivip/ingressVip may different subnet with machine network CIDR.

Additional info:

https://github.com/openshift/installer/pull/7803

Bug OCPBUGS-25016: Need to bump api at oc to include the CloudCredential capability

View the Description View the linked PRs

This is a clone of issue OCPBUGS-24834. The following is the description of the original issue:
—
Background:

CCO was made optional in https://issues.redhat.com/browse/OCPEDGE-69. CloudCredential was introduced as a new capability to openshift/api. We need to bump api at oc to include the CloudCredential capability so oc adm release extract works correctly.

Description of problem:

Some relevant CredentialsRequests are not extracted by the following command: oc adm release extract --credentials-requests --included --install-config=install-config.yaml ...
where install-config.yaml looks like the following:
...
capabilities:
  baselineCapabilitySet: None
  additionalEnabledCapabilities:
  - MachineAPI
  - CloudCredential
platform:
  aws:
...

Logs:

...
I1209 19:57:25.968783   79037 extract.go:418] Found manifest 0000_50_cloud-credential-operator_05-iam-ro-credentialsrequest.yaml
I1209 19:57:25.968902   79037 extract.go:429] Excluding Group: "cloudcredential.openshift.io" Kind: "CredentialsRequest" Namespace: "openshift-cloud-credential-operator" Name: "cloud-credential-operator-iam-ro": unrecognized capability names: CloudCredential
...

https://github.com/openshift/oc/pull/1623

Bug OCPBUGS-21722: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-25995: seLinuxMount is missed after changing to csi-operator

View the Description View the linked PRs

This is a clone of issue OCPBUGS-24245. The following is the description of the original issue:
—
https://github.com/openshift/csi-operator/blob/master/assets/overlays/aws-ebs/base/csidriver.yaml

Missed "seLinuxMount: true" which has been merged in https://github.com/bertinatto/aws-ebs-csi-driver-operator-1/blob/0a9642cff6d2a7f9aea940ce89b65fc189cba6b6/assets/csidriver.yaml#L14

https://github.com/openshift/csi-operator/pull/91

Bug OCPBUGS-19367: The console handler panics on baremetal 4.14.0-rc.0 ipv6 sno cluster

View the Description View the linked PRs

Description of problem:

baremetal 4.14.0-rc.0 ipv6 sno cluster, login as admin user to admin console, there is not Observe menu on the left navigation bar, see picture, https://drive.google.com/file/d/13RAXPxtKhAElN9xf8bAmLJa0GI8pP0fH/view?usp=sharing, monitoring-plugin status is Failed, see: https://drive.google.com/file/d/1YsSaGdLT4bMn-6E-WyFWbOpwvDY4t6na/view?usp=sharing, error is

Failed to get a valid plugin manifest from /api/plugins/monitoring-plugin/
r: Bad Gateway

checked console logs, 9443: connect: connection refused

$ oc -n openshift-console logs console-6869f8f4f4-56mbj
...
E0915 12:50:15.498589       1 handlers.go:164] GET request for "monitoring-plugin" plugin failed: Get "https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json": dial tcp [fd02::f735]:9443: connect: connection refused
2023/09/15 12:50:15 http: panic serving [fd01:0:0:1::2]:39156: runtime error: invalid memory address or nil pointer dereference
goroutine 183760 [running]:
net/http.(*conn).serve.func1()
    /usr/lib/golang/src/net/http/server.go:1854 +0xbf
panic({0x3259140, 0x4fcc150})
    /usr/lib/golang/src/runtime/panic.go:890 +0x263
github.com/openshift/console/pkg/plugins.(*PluginsHandler).proxyPluginRequest(0xc0003b5760, 0x2?, {0xc0009bc7d1, 0x11}, {0x3a41fa0, 0xc0002f6c40}, 0xb?)
    /go/src/github.com/openshift/console/pkg/plugins/handlers.go:165 +0x582
github.com/openshift/console/pkg/plugins.(*PluginsHandler).HandlePluginAssets(0xaa00000000000010?, {0x3a41fa0, 0xc0002f6c40}, 0xc0001f7500)
    /go/src/github.com/openshift/console/pkg/plugins/handlers.go:147 +0x26d
github.com/openshift/console/pkg/server.(*Server).HTTPHandler.func23({0x3a41fa0?, 0xc0002f6c40?}, 0x7?)
    /go/src/github.com/openshift/console/pkg/server/server.go:604 +0x33
net/http.HandlerFunc.ServeHTTP(...)
    /usr/lib/golang/src/net/http/server.go:2122
github.com/openshift/console/pkg/server.authMiddleware.func1(0xc0001f7500?, {0x3a41fa0?, 0xc0002f6c40?}, 0xd?)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:25 +0x31
github.com/openshift/console/pkg/server.authMiddlewareWithUser.func1({0x3a41fa0, 0xc0002f6c40}, 0xc0001f7500)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:81 +0x46c
net/http.HandlerFunc.ServeHTTP(0x5120938?, {0x3a41fa0?, 0xc0002f6c40?}, 0x7ffb6ea27f18?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.StripPrefix.func1({0x3a41fa0, 0xc0002f6c40}, 0xc0001f7400)
    /usr/lib/golang/src/net/http/server.go:2165 +0x332
net/http.HandlerFunc.ServeHTTP(0xc001102c00?, {0x3a41fa0?, 0xc0002f6c40?}, 0xc000655a00?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.(*ServeMux).ServeHTTP(0x34025e0?, {0x3a41fa0, 0xc0002f6c40}, 0xc0001f7400)
    /usr/lib/golang/src/net/http/server.go:2500 +0x149
github.com/openshift/console/pkg/server.securityHeadersMiddleware.func1({0x3a41fa0, 0xc0002f6c40}, 0x3305040?)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:128 +0x3af
net/http.HandlerFunc.ServeHTTP(0x0?, {0x3a41fa0?, 0xc0002f6c40?}, 0x11db52e?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.serverHandler.ServeHTTP({0xc0008201e0?}, {0x3a41fa0, 0xc0002f6c40}, 0xc0001f7400)
    /usr/lib/golang/src/net/http/server.go:2936 +0x316
net/http.(*conn).serve(0xc0009b4120, {0x3a43e70, 0xc001223500})
    /usr/lib/golang/src/net/http/server.go:1995 +0x612
created by net/http.(*Server).Serve
    /usr/lib/golang/src/net/http/server.go:3089 +0x5ed
I0915 12:50:24.267777       1 handlers.go:118] User settings ConfigMap "user-settings-4b4c2f4d-159c-4358-bba3-3d87f113cd9b" already exist, will return existing data.
I0915 12:50:24.267813       1 handlers.go:118] User settings ConfigMap "user-settings-4b4c2f4d-159c-4358-bba3-3d87f113cd9b" already exist, will return existing data.
E0915 12:50:30.155515       1 handlers.go:164] GET request for "monitoring-plugin" plugin failed: Get "https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json": dial tcp [fd02::f735]:9443: connect: connection refused
2023/09/15 12:50:30 http: panic serving [fd01:0:0:1::2]:42990: runtime error: invalid memory address or nil pointer dereference

9443 port is Connection refused

$ oc -n openshift-monitoring get pod -o wide
NAME                                                     READY   STATUS    RESTARTS   AGE     IP                  NODE    NOMINATED NODE   READINESS GATES
alertmanager-main-0                                      6/6     Running   6          3d22h   fd01:0:0:1::564     sno-2   <none>           <none>
cluster-monitoring-operator-6cb777d488-nnpmx             1/1     Running   4          7d16h   fd01:0:0:1::12      sno-2   <none>           <none>
kube-state-metrics-dc5f769bc-p97m7                       3/3     Running   12         7d16h   fd01:0:0:1::3b      sno-2   <none>           <none>
monitoring-plugin-85bfb98485-d4g5x                       1/1     Running   4          7d16h   fd01:0:0:1::55      sno-2   <none>           <none>
node-exporter-ndnnj                                      2/2     Running   8          7d16h   2620:52:0:165::41   sno-2   <none>           <none>
openshift-state-metrics-78df59b4d5-j6r5s                 3/3     Running   12         7d16h   fd01:0:0:1::3a      sno-2   <none>           <none>
prometheus-adapter-6f86f7d8f5-ttflf                      1/1     Running   0          4h23m   fd01:0:0:1::b10c    sno-2   <none>           <none>
prometheus-k8s-0                                         6/6     Running   6          3d22h   fd01:0:0:1::566     sno-2   <none>           <none>
prometheus-operator-7c94855989-csts2                     2/2     Running   8          7d16h   fd01:0:0:1::39      sno-2   <none>           <none>
prometheus-operator-admission-webhook-7bb64b88cd-bvq8m   1/1     Running   4          7d16h   fd01:0:0:1::37      sno-2   <none>           <none>
thanos-querier-5bbb764599-vlztq                          6/6     Running   6          3d22h   fd01:0:0:1::56a     sno-2   <none>           <none>

$  oc -n openshift-monitoring get svc monitoring-plugin
NAME                TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
monitoring-plugin   ClusterIP   fd02::f735   <none>        9443/TCP   7d16h


$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -v 'https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json' | jq
*   Trying fd02::f735...
* TCP_NODELAY set
* connect to fd02::f735 port 9443 failed: Connection refused
* Failed to connect to monitoring-plugin.openshift-monitoring.svc.cluster.local port 9443: Connection refused
* Closing connection 0
curl: (7) Failed to connect to monitoring-plugin.openshift-monitoring.svc.cluster.local port 9443: Connection refused
command terminated with exit code 7

no such issue in other 4.14.0-rc.0 ipv4 cluster, but issue reproduced on other 4.14.0-rc.0 ipv6 cluster.
4.14.0-rc.0 ipv4 cluster,

$ oc get clusterversion
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-rc.0   True        False         20m     Cluster version is 4.14.0-rc.0

$ oc -n openshift-monitoring get pod -o wide | grep monitoring-plugin
monitoring-plugin-85bfb98485-nh428                       1/1     Running   0          4m      10.128.0.107   ci-ln-pby4bj2-72292-l5q8v-master-0   <none>           <none>

$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k  'https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json' | jq
...
{
  "name": "monitoring-plugin",
  "version": "1.0.0",
  "displayName": "OpenShift console monitoring plugin",
  "description": "This plugin adds the monitoring UI to the OpenShift web console",
  "dependencies": {
    "@console/pluginAPI": "*"
  },
  "extensions": [
    {
      "type": "console.page/route",
      "properties": {
        "exact": true,
        "path": "/monitoring",
        "component": {
          "$codeRef": "MonitoringUI"
        }
      }
    },
...

meet issue "9443: Connection refused" in 4.14.0-rc.0 ipv6 cluster(launched cluster-bot cluster: launch 4.14.0-rc.0 metal,ipv6) and login console

$ oc get clusterversion
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-rc.0   True        False         44m     Cluster version is 4.14.0-rc.0
$ oc -n openshift-monitoring get pod -o wide | grep monitoring-plugin
monitoring-plugin-bd6ffdb5d-b5csk                        1/1     Running   0          53m   fd01:0:0:4::b             worker-0.ostest.test.metalkube.org   <none>           <none>
monitoring-plugin-bd6ffdb5d-vhtpf                        1/1     Running   0          53m   fd01:0:0:5::9             worker-2.ostest.test.metalkube.org   <none>           <none>
$ oc -n openshift-monitoring get svc monitoring-plugin
NAME                TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
monitoring-plugin   ClusterIP   fd02::402d   <none>        9443/TCP   59m

$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -v 'https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json' | jq
*   Trying fd02::402d...
* TCP_NODELAY set
* connect to fd02::402d port 9443 failed: Connection refused
* Failed to connect to monitoring-plugin.openshift-monitoring.svc.cluster.local port 9443: Connection refused
* Closing connection 0
curl: (7) Failed to connect to monitoring-plugin.openshift-monitoring.svc.cluster.local port 9443: Connection refused
command terminated with exit code 7$ oc -n openshift-console get pod | grep console
console-5cffbc7964-7ljft     1/1     Running   0          56m
console-5cffbc7964-d864q     1/1     Running   0          56m$ oc -n openshift-console logs console-5cffbc7964-7ljft
...
E0916 14:34:16.330117       1 handlers.go:164] GET request for "monitoring-plugin" plugin failed: Get "https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json": dial tcp [fd02::402d]:9443: connect: connection refused
2023/09/16 14:34:16 http: panic serving [fd01:0:0:4::2]:37680: runtime error: invalid memory address or nil pointer dereference
goroutine 3985 [running]:
net/http.(*conn).serve.func1()
    /usr/lib/golang/src/net/http/server.go:1854 +0xbf
panic({0x3259140, 0x4fcc150})
    /usr/lib/golang/src/runtime/panic.go:890 +0x263
github.com/openshift/console/pkg/plugins.(*PluginsHandler).proxyPluginRequest(0xc0008f6780, 0x2?, {0xc000665211, 0x11}, {0x3a41fa0, 0xc0009221c0}, 0xb?)
    /go/src/github.com/openshift/console/pkg/plugins/handlers.go:165 +0x582
github.com/openshift/console/pkg/plugins.(*PluginsHandler).HandlePluginAssets(0xfe00000000000010?, {0x3a41fa0, 0xc0009221c0}, 0xc000d8d600)
    /go/src/github.com/openshift/console/pkg/plugins/handlers.go:147 +0x26d
github.com/openshift/console/pkg/server.(*Server).HTTPHandler.func23({0x3a41fa0?, 0xc0009221c0?}, 0x7?)
    /go/src/github.com/openshift/console/pkg/server/server.go:604 +0x33
net/http.HandlerFunc.ServeHTTP(...)
    /usr/lib/golang/src/net/http/server.go:2122
github.com/openshift/console/pkg/server.authMiddleware.func1(0xc000d8d600?, {0x3a41fa0?, 0xc0009221c0?}, 0xd?)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:25 +0x31
github.com/openshift/console/pkg/server.authMiddlewareWithUser.func1({0x3a41fa0, 0xc0009221c0}, 0xc000d8d600)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:81 +0x46c
net/http.HandlerFunc.ServeHTTP(0xc000653830?, {0x3a41fa0?, 0xc0009221c0?}, 0x7f824506bf18?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.StripPrefix.func1({0x3a41fa0, 0xc0009221c0}, 0xc000d8d500)
    /usr/lib/golang/src/net/http/server.go:2165 +0x332
net/http.HandlerFunc.ServeHTTP(0xc00007e800?, {0x3a41fa0?, 0xc0009221c0?}, 0xc000b2da00?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.(*ServeMux).ServeHTTP(0x34025e0?, {0x3a41fa0, 0xc0009221c0}, 0xc000d8d500)
    /usr/lib/golang/src/net/http/server.go:2500 +0x149
github.com/openshift/console/pkg/server.securityHeadersMiddleware.func1({0x3a41fa0, 0xc0009221c0}, 0x3305040?)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:128 +0x3af
net/http.HandlerFunc.ServeHTTP(0x0?, {0x3a41fa0?, 0xc0009221c0?}, 0x11db52e?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.serverHandler.ServeHTTP({0xc000db9b00?}, {0x3a41fa0, 0xc0009221c0}, 0xc000d8d500)
    /usr/lib/golang/src/net/http/server.go:2936 +0x316
net/http.(*conn).serve(0xc000653680, {0x3a43e70, 0xc000676f30})
    /usr/lib/golang/src/net/http/server.go:1995 +0x612
created by net/http.(*Server).Serve
    /usr/lib/golang/src/net/http/server.go:3089 +0x5ed

Version-Release number of selected component (if applicable):

baremetal 4.14.0-rc.0 ipv6 sno cluster,
$ token=`oc create token prometheus-k8s -n openshift-monitoring`
$ $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?' --data-urlencode 'query=virt_platform'  | jq
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "virt_platform",
          "baseboard_manufacturer": "Dell Inc.",
          "baseboard_product_name": "01J4WF",
          "bios_vendor": "Dell Inc.",
          "bios_version": "1.10.2",
          "container": "kube-rbac-proxy",
          "endpoint": "https",
          "instance": "sno-2",
          "job": "node-exporter",
          "namespace": "openshift-monitoring",
          "pod": "node-exporter-ndnnj",
          "prometheus": "openshift-monitoring/k8s",
          "service": "node-exporter",
          "system_manufacturer": "Dell Inc.",
          "system_product_name": "PowerEdge R750",
          "system_version": "Not Specified",
          "type": "none"
        },
        "value": [
          1694785092.664,
          "1"
        ]
      }
    ]
  }
}

How reproducible:

only seen on this cluster

Steps to Reproduce:

1. see the description
2.
3.

Actual results:

no Observe menu on admin console, monitoring-plugin is failed

Expected results:

no error

https://github.com/openshift/console/pull/13166

Bug OCPBUGS-23084: Test "start build with broken proxy should start a build and wait for the build to fail [apigroup:build.openshift.io]" is too loose with checking for errors

View the Description View the linked PRs

Description of problem:

When the "start build with broken proxy should start a build and wait for the build to fail [apigroup:build.openshift.io]" test runs, it expects the build to exit with a failure before printing the text "clone" for its log.
Part of attempting to add a variant of this test which exercises the same functionality using an unprivileged build involves turning up the logging level so that the builder will log information that the test can look for which confirms that it was run in an unprivileged mode.  I'd like for it to print the name under which it was invoked, so that it's easier to find where a particular container's output starts in the log, but that name is openshift-git-clone.  The log message which would indicate that the test failed includes the text "git clone", so I'd like to amend the test to fail when that text is found in the log instead.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Modify the test to increase the logging level for its test build.
2. Apply https://github.com/openshift/builder/pull/358 to the builder image.
3. Run the test.

Actual results:

The test always fails (or "fails").

Expected results:

The test passes, unless we broke something somewhere.

Additional info:

https://github.com/openshift/origin/pull/28352

Bug OCPBUGS-24112: Update 4.15 ose-cluster-storage-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-storage-operator/pull/424

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-storage-operator/pull/424

Bug OCPBUGS-24995: [azure] bootstrap failed to be provisioned when vm type is set to Standard_NP10s

View the Description View the linked PRs

Description of problem:

Configure vm type as Standard_NP10s in install-config, which only supports Generation V1.
--------------
compute:
- architecture: amd64
  hyperthreading: Enabled
  name: worker
  platform:
    azure:
      type: Standard_NP10s
  replicas: 3
controlPlane:
  architecture: amd64
  hyperthreading: Enabled
  name: master
  platform:
    azure:
      type: Standard_NP10s
  replicas: 3

Continue installation, installer failed when provisioning bootstrap node.
--------------
ERROR                                              
ERROR Error: creating Linux Virtual Machine: (Name "jima1211test-rqfhm-bootstrap" / Resource Group "jima1211test-rqfhm-rg"): compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="BadRequest" Message="The selected VM size 'Standard_NP10s' cannot boot Hypervisor Generation '2'. If this was a Create operation please check that the Hypervisor Generation of the Image matches the Hypervisor Generation of the selected VM Size. If this was an Update operation please select a Hypervisor Generation '2' VM Size. For more information, see https://aka.ms/azuregen2vm" 
ERROR                                              
ERROR   with azurerm_linux_virtual_machine.bootstrap, 
ERROR   on main.tf line 193, in resource "azurerm_linux_virtual_machine" "bootstrap": 
ERROR  193: resource "azurerm_linux_virtual_machine" "bootstrap" { 
ERROR                                              
ERROR failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failure applying terraform for "bootstrap" stage: error applying Terraform configs: failed to apply Terraform: exit status 1 
ERROR                                              
ERROR Error: creating Linux Virtual Machine: (Name "jima1211test-rqfhm-bootstrap" / Resource Group "jima1211test-rqfhm-rg"): compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="BadRequest" Message="The selected VM size 'Standard_NP10s' cannot boot Hypervisor Generation '2'. If this was a Create operation please check that the Hypervisor Generation of the Image matches the Hypervisor Generation of the selected VM Size. If this was an Update operation please select a Hypervisor Generation '2' VM Size. For more information, see https://aka.ms/azuregen2vm" 
ERROR                                              
ERROR   with azurerm_linux_virtual_machine.bootstrap, 
ERROR   on main.tf line 193, in resource "azurerm_linux_virtual_machine" "bootstrap": 
ERROR  193: resource "azurerm_linux_virtual_machine" "bootstrap" { 
ERROR                                              
ERROR                                              

seems that issue is introduced by https://github.com/openshift/installer/pull/7642/

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-12-09-012410

How reproducible:

Always

Steps to Reproduce:

    1. configure vm type to Standard_NP10s on control-plane in install-config.yaml
    2. install cluster
    3.

Actual results:

    installer failed when provisioning bootstrap node

Expected results:

    installation get successful

Additional info:

https://github.com/openshift/installer/pull/7822

Bug OCPBUGS-19173: Update 4.15 openshift-enterprise-console-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/console-operator/pull/794

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-25144: MachineConfigNode lister fires unexpectedly

View the Description View the linked PRs

Description of problem: MCN lister fires in the operator pod before the CRD exists. This causes API issues and could impact upgrades.

    Version-Release number of selected component (if applicable):{code:none}

How reproducible: always

    Steps to Reproduce:{code:none}
    1. upgrade to 4.15 from any version
    2.
    3.

Actual results:

I1211 18:44:40.972098       1 operator.go:347] Starting MachineConfigOperator
I1211 18:44:40.982079       1 event.go:298] Event(v1.ObjectReference{Kind:"", Namespace:"openshift-machine-config-operator", Name:"machine-config", UID:"68bc5e8f-b7f5-4506-a870-2eecaa5afd35", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorVersionChanged' clusteroperator/machine-config-operator started a version change from [{operator 4.14.6}] to [{operator 4.15.0-0.nightly-2023-12-11-033133}]
W1211 18:44:41.255502       1 reflector.go:535] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
E1211 18:44:41.255587       1 reflector.go:147] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: Failed to watch *v1alpha1.MachineConfigNode: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
E1211 18:58:04.915119       1 reflector.go:147] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: Failed to watch *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
W1211 18:58:06.425952       1 reflector.go:535] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
E1211 18:58:06.426037       1 reflector.go:147] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: Failed to watch *v1alpha1.MachineConfigNode: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
W1211 18:58:09.396004       1 reflector.go:535] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
E1211 18:58:09.396068       1 reflector.go:147] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: Failed to watch *v1alpha1.MachineConfigNode: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
W1211 18:58:14.540488       1 reflector.go:535] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
E1211 18:58:14.540560       1 reflector.go:147] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: Failed to watch *v1alpha1.MachineConfigNode: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
W1211 18:58:25.293029       1 reflector.go:535] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
E1211 18:58:25.293095       1 reflector.go:147] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: Failed to watch *v1alpha1.MachineConfigNode: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
W1211 18:58:50.166866       1 reflector.go:535] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
E1211 18:58:50.166903       1 reflector.go:147] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: Failed to watch *v1alpha1.MachineConfigNode: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
W1211 18:59:39.950454       1 reflector.go:535] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
E1211 18:59:39.950523       1 reflector.go:147] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: Failed to watch *v1alpha1.MachineConfigNode: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
W1211 19:00:23.432005       1 reflector.go:535] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
E1211 19:00:23.432038       1 reflector.go:147] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: Failed to watch *v1alpha1.MachineConfigNode: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
W1211 19:01:13.237298       1 reflector.go:535] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
E1211 19:01:13.237382       1 reflector.go:147] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: Failed to watch *v1alpha1.MachineConfigNode: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
W1211 19:02:02.035555       1 reflector.go:535] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
E1211 19:02:02.035628       1 reflector.go:147] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: Failed to watch *v1alpha1.MachineConfigNode: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
W1211 19:02:52.111260       1 reflector.go:535] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
E1211 19:02:52.111332       1 reflector.go:147] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: Failed to watch *v1alpha1.MachineConfigNode: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
W1211 19:03:38.243461       1 reflector.go:535] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
E1211 19:03:38.243499       1 reflector.go:147] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: Failed to watch *v1alpha1.MachineConfigNode: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
W1211 19:04:27.848493       1 reflector.go:535] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
E1211 19:04:27.848585       1 reflector.go:147] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: Failed to watch *v1alpha1.MachineConfigNode: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
E1211 19:05:37.064033       1 sync.go:1250] Error syncing Required MachineConfigPools: "error MachineConfigPool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 1)"
E1211 19:05:38.057685       1 sync.go:1250] Error syncing Required MachineConfigPools: "error MachineConfigPool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 1)"
E1211 19:05:39.036638       1 sync.go:1250] Error syncing Required MachineConfigPools: "error MachineConfigPool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 1)"
E1211 19:05:40.039736       1 sync.go:1250] Error syncing Required MachineConfigPools: "error MachineConfigPool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 1)"
E1211 19:05:41.039696       1 sync.go:1250] Error syncing Required MachineConfigPools: "error MachineConfigPool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 1)"
E1211 19:05:42.034840       1 sync.go:1250] Error syncing Required MachineConfigPools: "error MachineConfigPool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 1)"
E1211 19:05:43.044901       1 sync.go:1250] Error syncing Required MachineConfigPools: "error MachineConfigPool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 1)"
E1211 19:05:44.033229       1 sync.go:1250] Error syncing Required MachineConfigPools: "error MachineConfigPool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 1)"
E1211 19:05:45.034792       1 sync.go:1250] Error syncing Required MachineConfigPools: "error MachineConfigPool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 1)"
E1211 19:05:46.052866       1 sync.go:1250] Error syncing Required MachineConfigPools: "error MachineConfigPool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 1)"

    Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/4071

Bug OCPBUGS-25552: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/886

Bug MGMT-15683: Custom Manifest - able to create with api manifest with empty name .yaml

View the Description View the linked PRs

Description of the problem:
I am able to create a custom manifest with name .yaml
I blieve API should block this

How reproducible:
Using test infra i create a manifest with .yaml filename

Steps to reproduce:

1. using v2_create_cluster_manifest i am able to create manifest with ".yaml " filename

Actual results:
manifest created , no error thrown and i am able to list the manifest and see it is applied to cluster

Expected results:
should throw 422 exception

https://github.com/openshift/assisted-service/pull/5635

Bug OCPBUGS-23783: After PatternFly5 update: Topology > Service binding misses application grouping

View the Description View the linked PRs

Issue and 45 and 55 from https://docs.google.com/spreadsheets/d/1TR3ENY-GE_LQL9F-xH6NahtRHdu6IrYGhLqDDA8y0EI/edit#gid=1035185624

When created operator-backed with service binding, the application group visual doesn't show up

Note: Is this really PF5-related, or does this issue exist already on 4.14?

Screenshot: https://drive.google.com/drive/u/1/folders/1OKeJ8PPGZi-1QyqQ184xQznmqii37NNB

https://github.com/openshift/console/pull/13376

Task MGMT-16236: Remove elastic APM dependency

View the Description View the linked PRs

elastic APM seems to be unused

https://github.com/openshift/assisted-service/pull/5702

Bug OCPBUGS-20110: Add an unit test - at least one interface must be defined for each node

View the Description View the linked PRs

Description of problem:

The unit test didn't cover a scenario when hosts are provided without any interfaces in the agent-config.yaml

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

No unit test

Expected results:

A valid unit test which tests the error message "at least one interface must be defined for each node"

Additional info:

https://github.com/openshift/installer/pull/7555

Bug OCPBUGS-23102: Metal jobs failing due to inability to reach thanos

View the Description View the linked PRs

All metal jobs failed a bunch of tests with errors about looking up thanos DNS record.

Example job
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-metal-ipi-ovn-ipv6/1722507545064509440

{ fail [github.com/openshift/origin/test/extended/prometheus/prometheus.go:106]: Failed to fetch alerting rules: unable to query https://thanos-querier-openshift-monitoring.apps.ostest.test.metalkube.org/api/v1/rules: Get "https://thanos-querier-openshift-monitoring.apps.ostest.test.metalkube.org/api/v1/rules": dial tcp: lookup thanos-querier-openshift-monitoring.apps.ostest.test.metalkube.org on 172.30.0.10:53: no such host: %!w(<nil>) Ginkgo exit error 1: exit with code 1}

[sig-instrumentation][Late] OpenShift alerting rules [apigroup:image.openshift.io] should link to an HTTP(S) location if the runbook_url annotation is defined [Suite:openshift/conformance/parallel]

https://github.com/openshift/origin/pull/28389

Bug OCPBUGS-23779: After PatternFly5 update: YAML editor > Show tooltips let the page crash

View the Description View the linked PRs

Issue 52 from https://docs.google.com/spreadsheets/d/1TR3ENY-GE_LQL9F-xH6NahtRHdu6IrYGhLqDDA8y0EI/edit#gid=1035185624

Resource YAML view: Click on "Show tooltips" let the current page crash

Screenshot: https://drive.google.com/file/d/1lT3mUAPIm0ba5tNVDW3Ztz6Hgj4D1DFz/view?usp=drive_link

https://github.com/openshift/console/pull/13382

Bug OCPBUGS-19648: Introduce a node-identity with a validating webhook

View the linked PRs

https://github.com/openshift/cluster-network-operator/pull/1983

Bug OCPBUGS-22403: azure techpreview jobs are failing

View the Description View the linked PRs

Azure techpreview is permafail for about a week:

https://sippy.dptools.openshift.org/sippy-ng/jobs/4.15/analysis?filters=%7B%22items%22%3A%5B%7B%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22equals%22%2C%22value%22%3A%22periodic-ci-openshift-release-master-ci-4.15-e2e-azure-sdn-techpreview%22%7D%5D%7D

Example: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-azure-sdn-techpreview/1717112156069040128

There's a pod stuck in image pull backoff

NAME                                                       READY   STATUS             RESTARTS        AGE
azureserviceoperator-controller-manager-6b8fc86684-qgrvc   0/2     ImagePullBackOff   0               6h54m
capi-controller-manager-6f96987c5c-zmkpc                   1/1     Running            0               6h54m
capi-operator-controller-manager-578b9bd48f-gkgzv          2/2     Running            1 (6h55m ago)   7h2m
capz-controller-manager-5c6cb77b99-sh98n                   1/1     Running            0               6h54m
cluster-capi-operator-5974b7684b-4qjwn                     1/1     Running            0               7h2m

  containerStatuses:
  - image: registry.ci.openshift.org/openshift:kube-rbac-proxy
    imageID: ""
    lastState: {}
    name: kube-rbac-proxy
    ready: false
    restartCount: 0
    started: false
    state:
      waiting:
        message: Back-off pulling image "registry.ci.openshift.org/openshift:kube-rbac-proxy"
        reason: ImagePullBackOff
  - image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8cc3384be7d81e745ce671c668465ceef75f65652354ce305d7bee3ae21a5976
    imageID: ""
    lastState: {}
    name: manager
    ready: false
    restartCount: 0
    started: false
    state:
      waiting:
        message: secret "aso-controller-settings" not found
        reason: CreateContainerConfigError

https://github.com/openshift/cluster-capi-operator/pull/141

Bug OCPBUGS-24019: MCO TLS artifacts should have ownership annotations

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/4045

Bug OCPBUGS-24416: MachineConfigNode cannot be synced with node creation/deletion

View the Description View the linked PRs

unknown machine config node can be listed, the name is not in current cluster, in my cluster, there are 6 nodes, but I can see 10 machine config nodes

// current node
$ oc get node
NAME                                        STATUS   ROLES                  AGE     VERSION
ip-10-0-12-209.us-east-2.compute.internal   Ready    worker                 3h48m   v1.28.3+59b90bd
ip-10-0-23-177.us-east-2.compute.internal   Ready    control-plane,master   3h54m   v1.28.3+59b90bd
ip-10-0-32-216.us-east-2.compute.internal   Ready    control-plane,master   3h54m   v1.28.3+59b90bd
ip-10-0-42-207.us-east-2.compute.internal   Ready    worker                 53m     v1.28.3+59b90bd
ip-10-0-71-71.us-east-2.compute.internal    Ready    worker                 3h46m   v1.28.3+59b90bd
ip-10-0-81-190.us-east-2.compute.internal   Ready    control-plane,master   3h54m   v1.28.3+59b90bd

// current mcn
$ oc get machineconfignode
NAME                                        UPDATED   UPDATEPREPARED   UPDATEEXECUTED   UPDATEPOSTACTIONCOMPLETE   UPDATECOMPLETE   RESUMED
ip-10-0-12-209.us-east-2.compute.internal   True      False            False            False                      False            False
ip-10-0-23-177.us-east-2.compute.internal   True      False            False            False                      False            False
ip-10-0-32-216.us-east-2.compute.internal   True      False            False            False                      False            False
ip-10-0-42-207.us-east-2.compute.internal   True      False            False            False                      False            False
ip-10-0-53-5.us-east-2.compute.internal     True      False            False            False                      False            False
ip-10-0-56-84.us-east-2.compute.internal    True      False            False            False                      False            False
ip-10-0-58-210.us-east-2.compute.internal   True      False            False            False                      False            False
ip-10-0-58-99.us-east-2.compute.internal    False     True             True             Unknown                    False            False
ip-10-0-71-71.us-east-2.compute.internal    True      False            False            False                      False            False
ip-10-0-81-190.us-east-2.compute.internal   True      False            False            False                      False            False

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-12-04-162702

How reproducible:

Consistently

Steps to Reproduce:

1. setup cluster with 4.15.0-0.nightly-2023-12-04-162702 on aws
2. enable featureSet: TechPreviewNoUpgrade
3. apply file based mc few times.
4. check node list
5. check machine config node list

Actual results:

there are some unknown machine config nodes found

Expected results:

machine config node number should be same as cluster node number

Additional info:

must-gather: https://drive.google.com/file/d/1-VTismwXXZ9sYMHi8hDL7vhwzjuMn92n/view?usp=drive_link

https://github.com/openshift/machine-config-operator/pull/4062

Bug OCPBUGS-10133: Update 4.14 ose-openstack-cinder-csi-driver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-openstack/pull/187

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-openstack/pull/187

Bug OCPBUGS-18785: sdn-controller should never try to a lease as "localhost.localdomain"

View the Description View the linked PRs

Description of problem:

During a highly escalated scenario, we have found the following scenario:
- Due to an unrelated problem, 2 control plane nodes had "localhost.localdomain" hostname when their respective sdn-controller pods started (this problem would be out of the scope of this bug report).
- As both sdn-controller pods had (and retained) the "localhost.localdomain" hostname, this caused both of them to use "localhost.localdomain" while trying to acquire and renew the controller lease in openshift-network-controller configmap.
- This ultimately caused both sdn-controller pods to mistakenly believe that they were the active sdn-controller, so both of them were active at the same time.

Such a situation might have a number of undesired (and unknown) side effects. In our case, the result was that two nodes were allocated the same hostsubnet, disrupting pod communication between the 2 nodes and with the other nodes.

What we expect from this bug report: That the sdn-controller never tries to acquire a lease as "localhost.localdomain" during a failure scenario. The ideal solution would be to acquire the lease in a way that avoids collisions (more on this on comments), but at the very least, sdn-controller should prefer crash-looping rather than starting with a lease that can collide and wreak havoc.

Version-Release number of selected component (if applicable):

Found on 4.11, but it should be reproducible in 4.13 as well.

How reproducible:

Under some error scenarios where 2 control plane nodes temporarily have "localhost.localdomain" hostname by mistake.

Steps to Reproduce:

1. Start sdn-controller pods
2.
3.

Actual results:

2 sdn-controller pods acquire the lease with "localhost.localdomain" holderIdentity and become active at the same time.

Expected results:

No sdn-controller pod to acquire the lease with "localhost.localdomain" holderIdentity. Either use unique identities even when there is failure scenario or just crash-loop.

Additional info:

Just FYI, the trigger that caused the wrong domain was investigated at this other bug: https://issues.redhat.com/browse/OCPBUGS-11997

However, this situation may happen under other possible failure scenarios, so it is worth preventing it somehow.

Bug OCPBUGS-22358: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/origin/pull/28353

Task MGMT-15732: [CI] Enable defaulting webhook for hypershift install

View the Description View the linked PRs

For OCP 4.14+ need to include --enable-defaulting-webhook true to hypershift install command in CI

Reference: https://github.com/openshift/hypershift/pull/2922/files

Slack thread: https://redhat-internal.slack.com/archives/C014N2VLTQE/p1694090399430659

Bug OCPBUGS-24399: Remove unwanted list style bullets from dropdown menus

View the Description View the linked PRs

Description of problem:

After the PF5 upgrade, older components using PF4 dropdown menus had list style bullets appear for unordered lists

Version-Release number of selected component (if applicable):

How reproducible:

Metrics Plugin still uses PF4 components and styling

Additional info:

PatterFly removes list-style bullets or numbers from the <ul>/<ol> elements by default and then adds them where needed. 

The OCP console chose to override this because of the amount of <ul>/<ol> elements in our codebase that expect the default bullet or numbers to be present.

Bug screenshots
https://drive.google.com/drive/folders/1rP6Ls1R2GJoTArHg0oild5SWIWvNaMUv

https://github.com/openshift/console/pull/13406

Bug OCPBUGS-25604: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-vpc-block-csi-driver-operator/pull/103

Bug OCPBUGS-23467: 4.15.0-ec.2 and later should delete the validating-webhook-configuration ValidatingWebhookConfiguration

View the Description View the linked PRs

Description of problem:

It was renamed between ec.1 and ec.2:

$ oc adm release extract --to ec.1 quay.io/openshift-release-dev/ocp-release:4.15.0-ec.1-x86_64
$ oc adm release extract --to ec.2 quay.io/openshift-release-dev/ocp-release:4.15.0-ec.2-x86_64
$ yaml2json <ec.1/0000_30_cluster-api_10_webhooks.yaml | jq -r  .metadata.name
validating-webhook-configuration
$ yaml2json <ec.2/0000_30_cluster-api_10_webhooks.yaml | jq -r  .metadata.name
cluster-capi-operator

And the presence of the old config breaks updates across the gap, as the operator tries to act on resources that are still guarded by a webhook config, despite there no longer being anything serving the hooks it had pointed at. Or something like that. In any case, the cluster-api ClusterOperator goes Degraded=True on SyncingFailed with {{Failed to resync for operator: 4.15.0-ec.2 because &

{%!e(string=unable to reconcile CoreProvider: unable to create or update CoreProvider: Internal error occurred: failed calling webhook "vcoreprovider.operator.cluster.x-k8s.io": failed to call webhook: the server could not find the requested resource)}

}} until the old ValidatingWebhookConfiguration is deleted, and after that deletion, the ClusterOperator recovers.

Version-Release number of selected component (if applicable):

4.15.0-ec.2.

How reproducible:

Untested, but I'd guess 100%.

Steps to Reproduce:

1. Install a tech-preview 4.15.0-ec.1 cluster.
2. Request an update to 4.15.0-ec.2.
3. Wait an hour or so.

Actual results:

cluster-api ConsoleOperator is Degraded=True, blocking further progress in the ClusterVersion update.

Expected results:

ClusterVersion update happily completes.

https://github.com/openshift/cluster-capi-operator/pull/145

Story OSASINFRA-2139: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openstack-cinder-csi-driver-operator/pull/110

Bug OCPBUGS-19120: Update 4.15 openshift-state-metrics image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/openshift-state-metrics/pull/102

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/openshift-state-metrics/pull/102

Bug OCPBUGS-23085: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4046

Bug OCPBUGS-25234: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-operator/pull/83

Bug OCPBUGS-21791: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-autoscaler-operator/pull/295

Bug OCPBUGS-18439: Failure when creating operator-backed resources

View the Description View the linked PRs

Description of problem:

In the developer sandbox, the happy path to create operator-backed resources is broken.

Users can only work on their assigned namespace. When doing so, and attempting to create an Operator-backed resource from the Developer console, the user interface switches inadvertendly the working namespace from the user's to the `openshift` one. The console shows an error message when the user clicks the "create" button.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Login to the Developer Sandbox
2. Choose the Developer view
3. Click Add+ -> Developer Catalog -> Operator Backed
4. Filter by "integration"
5. Notice the working namespace is still the user's one. 
6. Select "Integration" (Camel K operator)
7. Click "Create"
8. Notice the working namespace has switched to `openshift`
9. Notice the custom resource in YAML view includes `namespace: openshift`
10. Click "Create"

Actual results:

An error message shows: "Danger alert:An error occurredintegrations.camel.apache.org is forbidden: User "bmesegue" cannot create resource "integrations" in API group "camel.apache.org" in the namespace "openshift""

Expected results:

On step 8, the working directory should remain the user's one
On step 9, in the YAML view, the namespace should be the user's one, or none.
After step 10, the creation process should trigger the creation of a Camel K integration.

Additional info:

https://github.com/openshift/console/pull/13132

Bug OCPBUGS-23543: Deployment option is missing in 'Deploy Image'

View the Description View the linked PRs

Description of problem:

The Deployment option is missing in 'Click on the names to access advanced options' list in Deploy image page, user cannot set up ENV related function anymore

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-11-20-205649

How reproducible:

Always

Steps to Reproduce:

1. Login OCP, and change to Developer perspective, navigate to Deploy Image page (+Add -> Container image)
   /deploy-image/ns/default
2. Scroll down and check if 'deployment' is list in the advance list
3.

Actual results:

deployment is missing in the advance list, user is not able to update the Environment variables anymore

Expected results:

deployment exist

Additional info:

https://drive.google.com/file/d/1ixQ33DdGzZTAWgzrpp57OqHGFS4v1_3T/view?usp=drive_link
https://drive.google.com/file/d/1dpgFtsr45IovSriwu0RPd0kq0DejRSAm/view?usp=drive_link

https://github.com/openshift/console/pull/13354

Bug OCPBUGS-19264: Update 4.15 ose-aws-pod-identity-webhook image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/aws-pod-identity-webhook/pull/167

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/aws-pod-identity-webhook/pull/167

Bug OCPBUGS-24127: Update 4.15 ose-cluster-cloud-controller-manager-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/302

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/302

Bug OCPBUGS-10562: Re-enable operator-uninstall.spec.ts

View the Description View the linked PRs

Description of problem:

Business Automation Operands fail to load in uninstall operator modal. With "Cannot load Operands. There was an error loading operands for this operator. Operands will need to be deleted manually..." alert message.

"Delete all operand instances for this operator__checkbox" is not shown so the test fails. 

https://search.ci.openshift.org/?search=Testing+uninstall+of+Business+Automation+Operator&maxAge=168h&context=1&type=junit&name=pull-ci-openshift-console-master-e2e-gcp-console&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13214

Bug OCPBUGS-19688: agent-tui should work on the serial console

View the Description View the linked PRs

The agent-tui interface for editing the network config for the Agent ISO at boot time only runs on the graphical console (tty1). It's difficult to run two copies, so this gives the most value for now when there is a graphical console available.

However, when the host has only a serial console, there are two consequences:

there's no way to edit the network config
console output is frozen while agent-tui is running in the background. If it is not possible to pull the release image, this freeze will last forever and the user will never get to the getty screen on the serial console that tells you about how the release image is not available.

Both situations could be resolved by allowing agent-tui to run on the serial console instead of the graphical console when there is no graphical console.

https://github.com/openshift/installer/pull/7526

Bug OCPBUGS-25240: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/180

Bug OCPBUGS-16550: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vmware-vsphere-csi-driver/pull/95

Bug OCPBUGS-23475: [Reliability][regression]multus pods memory increased from <100M to 700+M in 7 days

View the Description View the linked PRs

In Reliability (loaded longrun, the load is stable) test, the 3 multus pods memory increased from <100 MiB to 700+MB in 7 days.

The multus pods have requests memory: 65Mi, while there is no memory limit. If the test run for longer time and the memory keep increasing, this issue can impact the nodes' resource.

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-11-13-174800

How reproducible:

Met this the first time. I did not see this in 4.14's Reliability test.

Steps to Reproduce:

1. Install a AWS compact cluster with 3 masters, workers are on master nodes too. O
2. Run reliability-v2 test https://github.com/openshift/svt/tree/master/reliability-v2. The test will long run and simulate multiple customers usage on the cluster.
config: 1 admin, 5 dev-test, 5 dev-prod, 1 dev-cron.
3. Monitor the metrics: container_memory_rss{container="kube-multus",namespace="openshift-multus"}

Actual results:

3 multus pods memory increased from <100 MiB to 700+MB in 7 days.
After the test load stopped, the memory increase stopped, but didn't drop down.

Expected results:

memory should not continuous increase

Additional info:

% oc adm top pod -n openshift-multus --containers=true --sort-by memory -l app=multus
POD NAME CPU(cores) MEMORY(bytes)
multus-xp474 kube-multus 12m 1275Mi
multus-xp474 POD 0m 0Mi
multus-xt64s kube-multus 21m 971Mi
multus-xt64s POD 0m 0Mi
multus-d9xcs kube-multus 6m 757Mi
multus-d9xcs POD 0m 0Mi

The monitoring screenshots:

multus-memory-increase.png

multus-memory-increase-stop.png

Must-gather: must-gather.local.4628887688332215806.tar.gz

https://github.com/openshift/multus-cni/pull/201

Bug OCPBUGS-25734: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-operator/pull/1193

Bug OCPBUGS-4242: OCP 4.11 console UI is not consistent in showing what namespaces are managed

View the Description View the linked PRs

Description of problem:

OCP 4.11 console UI is not consistent in showing what namespaces are managed.

Below are the Results,  I have also attached the respective images,

1. Viewing installed operators for cp4i namespace shows the multi-namespace operators as managing All namespaces (but really these operators are restricted to 2 namespaces) ------>> Image multins-cp4i.png 
2. Viewing installed operators for ibm-common-services namespace shows the multi-namespace operators as managing 2 namespaces------>> image multins-ibm-cs.png
3. Viewing installed operators for All Projects shows the multi-namespace operators as managing 2 namespaces ---->> Image multins-all.p

Slack Thread: Slack Thread https://coreos.slack.com/archives/C6A3NV5J9/p1668535310411939

How reproducible:

 1.install operator into "cp4i" namespace (operator group is OwnNamespace with just "cp4i")

 2.install operator(s) into "ibm-common-services" namespace (operator group is OwnNamespace with just "ibm-common-services")

 3. edit the OperatorGroup in the "ibm-common-services" namespace and add the "cp4i" namespace -now the operators in "ibm-common-services" are included in both "ibm-common-services" and "cp4i" namespaces

 4. review the installed operators in the OCP 4.11 console for "cp4i", "ibm-common-services", and "All Projects"

Actual results:

Installed operators in cp4i project incorrectly shows Managed Namespaces as "All Namespaces". More can be seen in image----> multins-cp4i.png

Expected results:

Installed operators in cp4i project correctly shows Managed Namespaces

Additional info:

Slack Thread: Slack Thread https://coreos.slack.com/archives/C6A3NV5J9/p1668535310411939

https://github.com/openshift/console/pull/13194

Bug OCPBUGS-24085: Update 4.15 ose-cluster-kube-cluster-api-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-operator/pull/31

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-operator/pull/31

Bug OCPBUGS-19246: Update 4.15 ose-csi-driver-shared-resource-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-driver-shared-resource-operator/pull/84

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-driver-shared-resource-operator/pull/84

Bug OCPBUGS-21936: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/coredns/pull/101

Bug OCPBUGS-23650: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-gcp/pull/217

Bug OCPBUGS-18984: Potentially inconsistent snapshots taken from UpgradeBackupController on z releases

View the Description View the linked PRs

Description of problem:

With the fix for BZ 2079803 [1] we have introduced a backup trigger on every z-release (instead of every y-release). Sadly we have not updated the CVO [2] logic along with it, which effectively stops the upgrade until a snapshot was taken. 

Currently we have a split state machine (thanks Trevor):

... today we have this for minor updates:
1. User bumps ClusterVersion spec asking for a minor update
2. CVO checks for a recent etcd backup.  Until it is available, we refuse to accept the retarget request.
3. Once the etcd backup is available (assuming no other precondition issues), we accept the retarget and start updating.

While for patch updates:
1. User bumps ClusterVersion spec asking for a minor update.
2. CVO accepts the retarget, sets status.desired , and starts in on the update


In the latter two cases, it might be that the CEO takes a snapshot while the upgrade is already running (race condition). This creates an inconsistent snapshot, which on restore would just re-attempt to execute the (botched) upgrade.


[1] https://github.com/openshift/cluster-etcd-operator/pull/835
[2] https://github.com/openshift/cluster-version-operator/blob/master/pkg/payload/precondition/clusterversion/etcdbackup.go#L76-L77

Version-Release number of selected component (if applicable):

any OCP > 4.10

How reproducible:

almost always (race condition between CEO and CVO)

Steps to Reproduce:

1. trigger a z-upgrade
2. observe when the etcd backup is taken, it might happen after the upgrade is already in progress

Actual results:

The snapshot that was created contains parts of the newly upgraded OCP (CVO CRD or any other operator state).

Expected results:

The snapshot should not contain any information that could come through with the z-upgrade.

Additional info:

Either the CVO should also wait on z-upgrades to ensure the snapshots are consistently on a pre-upgrade state, or we revert the z-stream upgrade behavior again.

—

William Caban and our team decided to entirely remove the controller.

W. Trevor King to drop the requirement in CVO.

Bug OCPBUGS-24145: Update 4.15 ose-cluster-openshift-apiserver-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-openshift-apiserver-operator/pull/560

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-openshift-apiserver-operator/pull/560

Bug OCPBUGS-24131: Update 4.15 csi-driver-nfs-container image to be consistent with ART

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/csi-driver-nfs/pull/135

Bug TRT-1274: Collect Azure disk metrics and create intervals to show on spyglass

View the Description View the linked PRs

reason/DisruptionBegan request-audit-id/91e612b4-dd19-4783-ad62-46c55bbdaee4 backend-disruption-name/oauth-api-reused-connections connection/reused disruption/openshift-tests stopped responding to GET requests over reused connections: error running request: 500 Internal Server Error: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"etcdserver: leader changed","code":500}

https://grafana-loki.ci.openshift.org/explore?orgId=1&left=%7B%22datasource%22:%22PCEB727DF2F34084E%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22PCEB727DF2F34084E%22%7D,%22editorMode%22:%22code%22,%22expr%22:%22%7Btype%3D%5C%22origin-interval%5C%22,invoker%3D~%5C%22.%2A4.14.%2Aazure.%2A%5C%22%7D%20%20%7C%20unpack%20%7C~%20%5C%22disruption%5C%22%20%7C~%20%5C%22500%20Internal%20Server%20Error.%2Aleader%20changed%5C%22%5Cn%22,%22queryType%22:%22range%22%7D%5D,%22range%22:%7B%22from%22:%22now-2d%22,%22to%22:%22now%22%7D%7D

Feels like there's something here we could dig into.

Most common on azure.

May show up in search.ci as well to help find the jobs more easily?

https://github.com/openshift/origin/pull/28375

Bug OCPBUGS-19861: Multus annotation permissions: CNO should configure 24h cert for multus

View the Description View the linked PRs

Description of problem: Multus currently implements a certificate that exists for 10 minutes, we need to add configuration for certificates for 24 hours

https://github.com/openshift/cluster-network-operator/pull/2039

Bug OCPBUGS-18003: Outgoing traffic throughs EgressRouter is broken

View the Description View the linked PRs

Description of problem:

Found auto case OCP-42340 failed in ci job which version is 4.14.0-ec.4 and then reproduced issue in 4.14.0-0.nightly-2023-08-22-221456

Version-Release number of selected component (if applicable):

4.14.0-ec.4 4.14.0-0.nightly-2023-08-22-221456

How reproducible:

Always

Steps to Reproduce:

1. Deploy egressrouter on baremetal with 
{
    "kind": "List",
    "apiVersion": "v1",
    "metadata": {},
    "items": [
        {
            "apiVersion": "network.operator.openshift.io/v1",
            "kind": "EgressRouter",
            "metadata": {
                "name": "egressrouter-42430",
                "namespace": "e2e-test-networking-egressrouter-l4xgx"
            },
            "spec": {
                "addresses": [
                    {
                        "gateway": "192.168.111.1",
                        "ip": "192.168.111.55/24"
                    }
                ],
                "mode": "Redirect",
                "networkInterface": {
                    "macvlan": {
                        "mode": "Bridge"
                    }
                },
                "redirect": {
                    "redirectRules": [
                        {
                            "destinationIP": "142.250.188.206",
                            "port": 80,
                            "protocol": "TCP"
                        },
                        {
                            "destinationIP": "142.250.188.206",
                            "port": 8080,
                            "protocol": "TCP",
                            "targetPort": 80
                        },
                        {
                            "destinationIP": "142.250.188.206",
                            "port": 8888,
                            "protocol": "TCP",
                            "targetPort": 80
                        }
                    ]
                }
            }
        }
    ]
}

 % oc get pods -n  e2e-test-networking-egressrouter-l4xgx -o wide
NAME                                           READY   STATUS    RESTARTS   AGE   IP            NODE       NOMINATED NODE   READINESS GATES
egress-router-cni-deployment-c4bff88cf-skv9j   1/1     Running   0          69m   10.131.0.26   worker-0   <none>           <none>

2. Create service which point to egressrouter
% oc get svc -n e2e-test-networking-egressrouter-l4xgx -o yaml  
apiVersion: v1
items:
- apiVersion: v1
  kind: Service
  metadata:
    creationTimestamp: "2023-08-23T05:58:30Z"
    name: ovn-egressrouter-multidst-svc
    namespace: e2e-test-networking-egressrouter-l4xgx
    resourceVersion: "50383"
    uid: 07341ff1-6df3-40a6-b27e-59102d56e9c1
  spec:
    clusterIP: 172.30.10.103
    clusterIPs:
    - 172.30.10.103
    internalTrafficPolicy: Cluster
    ipFamilies:
    - IPv4
    ipFamilyPolicy: SingleStack
    ports:
    - name: con1
      port: 80
      protocol: TCP
      targetPort: 80
    - name: con2
      port: 5000
      protocol: TCP
      targetPort: 8080
    - name: con3
      port: 6000
      protocol: TCP
      targetPort: 8888
    selector:
      app: egress-router-cni
    sessionAffinity: None
    type: ClusterIP
  status:
    loadBalancer: {}
kind: List
metadata:
  resourceVersion: ""

  3. create a test pod to access the service or curl the egressrouter IP:port directly 
oc rsh -n e2e-test-networking-egressrouter-l4xgx hello-pod1                                  
~ $ curl 172.30.10.103:80 --connect-timeout 5
curl: (28) Connection timeout after 5001 ms
~ $ curl 10.131.0.26:80 --connect-timeout 5
curl: (28) Connection timeout after 5001 ms
 $ curl 10.131.0.26:8080 --connect-timeout 5
curl: (28) Connection timeout after 5001 ms

Actual results:

  connection failed

Expected results:

  connection succeed

Additional info:
Note, the issue didn't exist in 4.13. It passed in 4.13 latest nightly build 4.13.0-0.nightly-2023-08-11-101506

08-23 15:26:16.955  passed: (1m3s) 2023-08-23T07:26:07 "[sig-networking] SDN ConnectedOnly-Author:huirwang-High-42340-Egress router redirect mode with multiple destinations."

https://github.com/openshift/egress-router-cni/pull/77

Bug OCPBUGS-18598: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/1923

Bug OCPBUGS-18772: MCO keeps attempting to pull baremetalRuntimeCfg image again and again

View the Description View the linked PRs

MCO installs resolve-prepender NetworkManager script on the nodes. In order to find out node details it needs to pull baremetalRuntimeCfgImage. However, this image needs to be pulled just the first time, in the followup attempts this script just verifies that this image is available.

This is not desirable in situations where mirror / quay are not available or having a temporary problem - these kind of issues should not prevent the node from starting kubelet. During certificate rotation testing I noticed that the node with a significant time skew won't start kubelet, as it tries to pull baremetalRuntimeCfgImage for kubelet to start - but the image is already on the nodes and it doesn't need refreshing.

Task MGMT-16039: Upgrade to golang 1.20

View the Description View the linked PRs

Upgrade to golang 1.20 for all assisted-installer components

Bug OCPBUGS-19743: Console couldn't be started anymore with local bridge (error: Invalid URL)

View the Description View the linked PRs

After https://github.com/openshift/console/pull/13102 got merged, it isn't possible to start the local console bridge anymore.

The UI crashes with this error:

Uncaught TypeError: Failed to construct 'URL': Invalid URL
    at ./public/module/auth.js (main-c115e44b78283c32bc69.js:81514:7)
    at __webpack_require__ (runtime~main-bundle.js:90:30)

The loginErrorURL is a string that couldn't get parsed with new URL:

window.SERVER_FLAGS.loginErrorURL
'/auth/error'

new URL(window.SERVER_FLAGS.loginErrorURL)
VM55:1 Uncaught TypeError: Failed to construct 'URL': Invalid URL

https://github.com/openshift/console/pull/13192

Bug MGMT-15425: [Staging] MCE operator installation version is 2.3 only

View the Description View the linked PRs

Description of the problem:

MCE operator installation version is 2.3 only , It should be dynamic and consider OCP version

ocp_mce_version_matrix:

'4.14': '2.4'

'4.13': '2.3'

'4.12': '2.2'

'4.11': '2.1'

'4.10': '2.0'

How reproducible:

100%{}

Steps to reproduce:

1. Create a 4.12 cluster

2. Select MCE operator to be installed on cluster

3. Install cluster

4. Verify OCP and MCE versions

Actual results:

OCP 4.12.26, MCE 2.3.0

Looks like service install 2.3 only and not consider OCP version
https://github.com/openshift/assisted-service/blob/master/internal/operators/mce/config.go

const (
    MceMinOpenshiftVersion string = "4.10.0"
    MceChannel             string = "stable-2.3"

Expected results:
MCE 2.2

MCE installation version should be dynamic and depends on OCP version

ocp_mce_version_matrix:

'4.14': '2.4'

'4.13': '2.3'

'4.12': '2.2'

'4.11': '2.1'

'4.10': '2.0'

https://github.com/openshift/assisted-service/pull/5716

Bug OCPBUGS-23377: Cluster-version operator "Running sync"/"Done syncing" steady-state log volume

View the Description View the linked PRs

Description of problem:

The cluster-version operator is very chatty, and this can cause problems in clusters where logs are shipped off to external storage. We worked on this in rbhz#2034493, which taught 4.10 and later to move to level 2 logging, mostly to drop the client-side throttling messages. And we have been pushing OTA-923 to make logging tunable, to avoid the need to make "will we want to hear about this?" decisions in one place for all clusters at all times. But there is interest in reducing the amount of logging in older releases in ways that do not require a tunable knob, and this bug tracks another step in that direction: the Running sync / Done syncing messages.

h2 Version-Release number of selected component (if applicable):

All 4.y releases log these lines at high volume, but 4.10 and earlier are end-of-life, and 4.11 and 4.12 are in maintenance mode.

How reproducible:

Every time.

Steps to Reproduce:

1. Install a cluster.
2. Wait at least 30m since install or the most recent update completes, because we want the CVO to be chatty during those exciting times, and this bug is about steady-state log volume.
3. Collect CVO logs for the past 30m: oc -n openshift-cluster-version logs -l k8s-app=cluster-version-operator --since=40m --tail=-1 >cvo.log.

Actual results:

$ oc adm upgrade
Cluster version is 4.13.21
...
$ oc -n openshift-cluster-version logs -l k8s-app=cluster-version-operator --since=40m --tail=-1 > cvo.log
$ grep -o 'apply.*in state.*' cvo.log | uniq -c
     10 apply: 4.13.21 on generation 77 in state Reconciling at attempt 0
$ wc cvo.log 
  20043  242930 3071956 cvo.log
$ sed -n 's/^.* \([^ ]*[.]go:[0-9]*\).*/\1/p' cvo.log | sort | uniq -c | sort -n | tail -n5
    194 sync_worker.go:490
    314 sync_worker.go:978
    807 task_graph.go:477
   7971 sync_worker.go:1007
   7973 sync_worker.go:987
$ grep 'sync_worker.go:987' cvo.log | tail -n2
I1116 22:10:08.739999       1 sync_worker.go:987] Running sync for serviceaccount "openshift-cloud-credential-operator/cloud-credential-operator" (271 of 842)
I1116 22:10:08.785081       1 sync_worker.go:987] Running sync for flowschema "openshift-apiserver" (457 of 842)
$ grep 'sync_worker.go:1007' cvo.log | tail -n2
I1116 22:10:08.739967       1 sync_worker.go:1007] Done syncing for configmap "openshift-cloud-credential-operator/cco-trusted-ca" (270 of 842)
I1116 22:10:08.785043       1 sync_worker.go:1007] Done syncing for flowschema "openshift-apiserver-sar" (456 of 842)

So that's 3071956 bytes / 30 minutes * 60 minutes / 1 hour ~= 6 MB / hour, the bulk of which is Running sync and Done syncing logs.

Expected results:

$ grep -v 'sync_worker.go:\(987\|1007\)]' cvo.log | wc
   4099   51602  861709

So something closer to 861709 bytes / 30 minutes * 60 minutes / 1 hour ~= 2 MB / hour would be acceptable.

Additional info:

The CVO has a randomized sleep to cool off between sync cycles, and per-sync-cycle log volume will depend on (among other things) what that CVO container happened to choose for that sleep.

https://github.com/openshift/cluster-version-operator/pull/997

Bug OCPBUGS-24097: Update 4.15 ose-machine-api-provider-openstack-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-openstack/pull/99

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-openstack/pull/99

Story OSASINFRA-3199: openshift-install should configure User-Agent

View the Description View the linked PRs

openshift-install makes many calls to OpenStack APIs when installing OpenShift on OpenStack. Currently all of these calls use the same default User-Agent header gophercloud/x.y.z, where x.y.z is the version of the gophercloud that openshift-install was built with.

Keystone logs the User-Agent string, as do other OpenStack services, and it can provide important information about who is interacting with the cloud. As recently seen in ~~OCPBUGS-14049~~, it can also be useful when debugging issues with components.

We should configure the User-Agent header for openshift-install and all other OpenShift components that talk to OpenStack APIs.

https://github.com/openshift/installer/pull/7548

Bug OCPBUGS-19145: Update 4.15 ose-openshift-apiserver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/openshift-apiserver/pull/390

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/openshift-apiserver/pull/390

Bug OCPBUGS-19221: Update 4.15 ose-ibmcloud-machine-controllers image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-ibmcloud/pull/24

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-ibmcloud/pull/24

Bug OCPBUGS-21973: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/router/pull/529

Bug OCPBUGS-19147: Update 4.15 ose-azure-cluster-api-controllers image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-azure/pull/282

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-azure/pull/282

Story TRT-1375: MCDPivotError firing on GCP

View the Description View the linked PRs

This is going to block the next payload, it failed 10/10 runs, payload is https://amd64.ocp.releases.ci.openshift.org/releasestream/4.15.0-0.nightly/release/4.15.0-0.nightly-2023-11-30-112918

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/aggregated-gcp-ovn-rt-upgrade-4.15-minor-release-openshift-release-analysis-aggregator/1730187997023834112

Suspect

https://github.com/openshift/machine-config-operator/pull/3965/files

https://github.com/openshift/machine-config-operator/pull/4051

Bug OCPBUGS-19406: Fix script rh-manifest.sh in Openshift/Thanos

View the Description View the linked PRs

Description of problem:

The script rh-manifest.sh in Openshift/Thanos stops working, generating empty dependency list.

Version-Release number of selected component (if applicable):

How reproducible:

Run  script/rh-manifest.sh in Openshift/Thanos and check rh-manifest.txt.

Steps to Reproduce:

1.
2.
3.

Actual results:

The generated rh-manifest.txt is empty.

Expected results:

The generated rh-manifest.txt should list Javascript dependencies.

Additional info:

https://github.com/openshift/thanos/pull/120

Bug OCPBUGS-23565: Bump to kubernetes 1.28.4

View the Description View the linked PRs

This fix contains the following changes coming from updated version of kubernetes up to v1.28.4:

Changelog:
v1.28.4: https://github.com/kubernetes/kubernetes/blob/release-1.28/CHANGELOG/CHANGELOG-1.28.md#changelog-since-v1283

https://github.com/openshift/kubernetes/pull/1806

Bug OCPBUGS-24340: gather extra job_metrics.json is empty

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-monitoring-operator/pull/2178

Bug OCPBUGS-18122: ForceUpgradeTo Annotation should override current upgrade

View the Description View the linked PRs

Description of problem:

There is currently no way to interrupt a stuck HostedCluster upgrade because we don't allow another upgrade until the current upgrade is finished. At the very least we should allow overriding the upgrade with the ForceUpgradeTo annotation.

The function name doesn't honour the behaviour.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Install hosted cluster
2. Start upgrade to a bad release that will not complete
3. Attempt to override the current upgrade with a different release via annotation.

Actual results:

The override upgrade is not applied because the initial upgrade is not completed.

Expected results:

The override upgrade starts and completes successfully.

Additional info:

https://github.com/openshift/hypershift/blob/572a75655f0d86d6e2139f27e14eb1b168a5842b/hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go#L4123-L4135

https://github.com/openshift/hypershift/pull/2955

Bug OCPBUGS-18932: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/302

Bug OCPBUGS-23079: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/origin/pull/28388

Bug OCPBUGS-18401: String filter on events page doesn't work well

View the Description View the linked PRs

Description of problem:

Go to Home -> Events page, type string in filter field, the events are not filtered. (The search mode is fuzzy search by default)

Version-Release number of selected component (if applicable):

 4.14.0-0.nightly-2023-08-28-154013

How reproducible:

Always

Steps to Reproduce:

1.Go to Home -> Events page, type string in filter field,
2.
3.

Actual results:

1. The events are not filtered.

Expected results:

1. Should filter out events containing the filter string.

Additional info:

Type filter could work on events page.

Bug OCPBUGS-20016: Annotation and label modals do not update after opening

View the Description View the linked PRs

Description of problem:

Once the annotation or labels modals are opened, any changes to the underlying resources will not be reflected in the modal.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always

Steps to Reproduce:

1. Log into a cluter as kubeadmin via cli and console
2. Create a project named test
3. Vist the namespaces list page in the console (Administration > Namespaces)
4. Click "Edit annotations" via the kebab menu for namespace "foo"
5. From the cli, run the command:
  oc annotate namespace test foo=bar
6. Observe that the annotation modal did not update
7. Click cancel to close the annoatation modal
8. Open the annoation modal again and observe that the annoation added from the cli is now shown.
9. Repeat 5 - 8 using the labels modal and the command:
  oc label namespace test baz=qux

Actual results:

Annotation and labels modals do not update when the underlying resource labels or annotations change.

Expected results:

We should handle this case in some way

Additional info:

We can't necessarily just update the currently displayed data, as this could cause data loss or conflicts. 

The current behavior can also cause data loss in this situation:
- user opens modal
- a background update to annotations/modals occur
- user makes their own change and saves
- The annotations/labels from the background update are lost/squashed

Bug OCPBUGS-24123: Update 4.15 ose-openstack-cinder-csi-driver-container image to be consistent with ART

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cloud-provider-openstack/pull/249

Bug OCPBUGS-2889: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-image-registry-operator/pull/924

Bug OCPBUGS-19117: Update 4.15 ose-olm-catalogd image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/operator-framework-catalogd/pull/27

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/operator-framework-catalogd/pull/28

Bug OCPBUGS-24812: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13433

Story SDN-4047: Move bug dispatch tools from dougsland to network-tools repo

View the Description View the linked PRs

We heavily rely on scripts located in
https://github.com/dougsland/bz-query
in order to assign Jiras to members of the SDN team.

as a person in charge of knowing the bug load on each of our developers to decide
who is the best person to own un-assigned Jiras, we should have the scripts in a more fomal location.

https://github.com/openshift/network-tools/pull/88

Story CFE-955: Change the owner file on oc-mirror

View the linked PRs

https://github.com/openshift/oc-mirror/pull/694

Bug OCPBUGS-20246: Unresponsive server API in ipv6 disconnected agent-based hosted cluster

View the Description View the linked PRs

Description of problem:

Installing ipv6 agent-based hosted cluster in disconnected environment. The hosted control plane is available but when using its kubeconfig to run oc commands on the hosted cluster, I'm getting 

E1009 08:05:34.000946  115216 memcache.go:265] couldn't get current server API group list: Get "https://fd2e:6f44:5dd8::58:31765/api?timeout=32s": dial tcp [fd2e:6f44:5dd8::58]:31765: i/o timeout

Version-Release number of selected component (if applicable):

OCP 4.14.0-rc.4

How reproducible:

100%

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

I can use oc commands against the hosted cluster

Additional info:

Bug OCPBUGS-5969: [Nutanix]No host has enough available memory for VM, machine stuck in Provisioning and machineset scale/delete cannot delete machines

View the Description View the linked PRs

Description of problem:

Nutanix machine without enough memory stuck in Provisioning and machineset scale/delete cannot work

Version-Release number of selected component (if applicable):

Server Version: 
4.12.0
4.13.0-0.nightly-2023-01-17-152326

How reproducible:

Always

Steps to Reproduce:

1. Install Nutanix Cluster 
Template https://gitlab.cee.redhat.com/aosqe/flexy-templates/-/tree/master/functionality-testing/aos-4_12/ipi-on-nutanix//versioned-installer
master_num_memory: 32768
worker_num_memory: 16384
networkType: "OVNKubernetes"
installer_payload_image: quay.io/openshift-release-dev/ocp-release:4.12.0-x86_64 2.
3. Scale up the cluster worker machineset from 2 replicas to 40 replicas
4. Install a Infra machinesets with 3 replicas, and a Workload machinesets with 1 replica
Refer to this doc https://docs.openshift.com/container-platform/4.11/machine_management/creating-infrastructure-machinesets.html#machineset-yaml-nutanix_creating-infrastructure-machinesets  and config the following resource
VCPU=16
MEMORYMB=65536
MEMORYSIZE=64Gi

Actual results:

1. The new infra machines stuck in 'Provisioning' status for about 3 hours.

% oc get machines -A | grep Prov                                               
openshift-machine-api   qili-nut-big-jh468-infra-48mdt      Provisioning                                      175m
openshift-machine-api   qili-nut-big-jh468-infra-jnznv      Provisioning                                      175m
openshift-machine-api   qili-nut-big-jh468-infra-xp7xb      Provisioning                                      175m

2. Checking the Nutanix web console, I found 
infra machine 'qili-nut-big-jh468-infra-jnznv' had the following msg
"
No host has enough available memory for VM qili-nut-big-jh468-infra-48mdt (8d7eb6d6-a71e-4943-943a-397596f30db2) that uses 4 vCPUs and 65536MB of memory. You could try downsizing the VM, increasing host memory, power off some VMs, or moving the VM to a different host. Maximum allowable VM size is approximately 17921 MB
"

infra machine 'qili-nut-big-jh468-infra-jnznv' is not round

infra machine 'qili-nut-big-jh468-infra-xp7xb' is in green without warning.
But In must gather I found some error:
03:23:49openshift-machine-apinutanixcontrollerqili-nut-big-jh468-infra-xp7xbFailedCreateqili-nut-big-jh468-infra-xp7xb: reconciler failed to Create machine: failed to update machine with vm state: qili-nut-big-jh468-infra-xp7xb: failed to get node qili-nut-big-jh468-infra-xp7xb: Node "qili-nut-big-jh468-infra-xp7xb" not found

3. Scale down the worker machineset from 40 replicas to 30 replicas can not work. Still have 40 Running worker machines and 40 Ready nodes after about 3 hours.

% oc get machinesets -A
NAMESPACE               NAME                          DESIRED   CURRENT   READY   AVAILABLE   AGE
openshift-machine-api   qili-nut-big-jh468-infra      3         3                             176m
openshift-machine-api   qili-nut-big-jh468-worker     30        30        30      30          5h1m
openshift-machine-api   qili-nut-big-jh468-workload   1         1                             176m

% oc get machines -A | grep worker| grep Running -c
40

% oc get nodes | grep worker | grep Ready -c
40

4. I delete the infra machineset, but the machines still in Provisioning status and won't get deleted

% oc delete machineset -n openshift-machine-api   qili-nut-big-jh468-infra
machineset.machine.openshift.io "qili-nut-big-jh468-infra" deleted

% oc get machinesets -A
NAMESPACE               NAME                          DESIRED   CURRENT   READY   AVAILABLE   AGE
openshift-machine-api   qili-nut-big-jh468-worker     30        30        30      30          5h26m
openshift-machine-api   qili-nut-big-jh468-workload   1         1                             3h21m

% oc get machines -A | grep -v Running
NAMESPACE               NAME                                PHASE          TYPE   REGION    ZONE              AGE
openshift-machine-api   qili-nut-big-jh468-infra-48mdt      Provisioning                                      3h22m
openshift-machine-api   qili-nut-big-jh468-infra-jnznv      Provisioning                                      3h22m
openshift-machine-api   qili-nut-big-jh468-infra-xp7xb      Provisioning                                      3h22m
openshift-machine-api   qili-nut-big-jh468-workload-qdkvd                                                     3h22m

Expected results:

The new infra machines should be either Running or Failed.
Cluster worker machinest scaleup and down should not be impacted.

Additional info:

must-gather download url will be added to the comment.

https://github.com/openshift/machine-api-provider-nutanix/pull/52

Bug MGMT-16414: When trying to create cluster with s390x architecture, an error occurs that stops cluster creation

View the Description View the linked PRs

Description of the problem:

When trying to create cluster with s390x architecture, an error occurs that stops cluster creation. The error is "cannot use Skip MCO reboot because it's not compatible with the s390x architecture on version 4.15.0-ec.3 of OpenShift"

How reproducible:

Always

Steps to reproduce:

Create cluster with architecture s390x

Actual results:

Create failed

Expected results:

Create should succeed

https://github.com/openshift/assisted-service/pull/5876

Bug OCPBUGS-18188: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13111

Bug OCPBUGS-19050: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/1952

Bug OCPBUGS-25392: [4.16] OCP 4.15 nightly deployment on a bare-metal server without using the provisioning network is stuck during deployment.

View the Description View the linked PRs

Description of problem:

OCP 4.15 nightly deployment on a Bare-metal servers without using the provisioning network is stuck during deployment.

Job history:

https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-telco5g

Deployment stuck similiar to this:

Upstream job logs:

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-telco5g/1732520780954079232/artifacts/e2e-telco5g/telco5g-cluster-setup/artifacts/cloud-init-output.log

~~~

level=debug msg=ironic_node_v1.openshift-master-host[2]: Creating...level=debug msg=ironic_node_v1.openshift-master-host[0]: Creating...level=debug msg=ironic_node_v1.openshift-master-host[1]: Creating...level=debug msg=ironic_node_v1.openshift-master-host[0]: Still creating... [10s elapsed]..level=debug msg=ironic_node_v1.openshift-master-host[0]: Still creating... [2h28m51s elapsed]level=debug msg=ironic_node_v1.openshift-master-host[1]: Still creating... [2h28m51s elapsed]
~~~

Ironic logs from bootstrap node:
~~~
Dec 07 13:10:13 localhost.localdomain start-provisioning-nic.sh[3942]: Error: failed to modify ipv4.addresses: invalid IP address: Invalid IPv4 address ''.
Dec 07 13:10:13 localhost.localdomain systemd[1]: provisioning-interface.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Dec 07 13:10:13 localhost.localdomain systemd[1]: provisioning-interface.service: Failed with result 'exit-code'.
Dec 07 13:10:13 localhost.localdomain systemd[1]: Failed to start Provisioning interface.
Dec 07 13:10:13 localhost.localdomain systemd[1]: Dependency failed for DHCP Service for Provisioning Network.
Dec 07 13:10:13 localhost.localdomain systemd[1]: ironic-dnsmasq.service: Job ironic-dnsmasq.service/start failed with result 'dependency'
~~~

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Everytime

Steps to Reproduce:

1.Deploy OCP

More information about our setup:
In our environment, We have 3 virtual master node, 1 virtual worker and 1 baremetal worker. We use KCLI tool for creation of the virtual environment and for running the deployment workflow using IPI, In our setup we don't use provisioning network. (Same setup is used for other OCP version till 4.14 and are working fine.)

We have attached our install-config.yaml (for RH employees) and logs from bootstrap node.

Actual results:

Deployment is failing

Dec 07 13:10:13 localhost.localdomain start-provisioning-nic.sh[3942]: Error: failed to modify ipv4.addresses: invalid IP address: Invalid IPv4 address ''.

Expected results:

Deployment should pass

Additional info:

https://github.com/openshift/ironic-image/pull/440

Bug OCPBUGS-18567: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-version-operator/pull/965

Bug OCPBUGS-19283: Update 4.15 ose-cli-artifacts image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/oc/pull/1546

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/oc/pull/1546

Bug OCPBUGS-22453: OKD: ABI is broken for OKD/FCOS when disconnected registry is a subdomain of cluster domain

View the Description View the linked PRs

When using a disconnected image registry which is hosted at a subdomain of the cluster domain, then Agent-based Installer fails to install a OKD/FCOS cluster. The rendezvous host starts bootkube.sh but fails because it cannot resolve the registry DNS name:

Oct 25 12:47:03 master-0 bootkube.sh[6462]: error: unable to read image virthost.ostest.test.metalkube.org:5000/localimages/local-release-image@sha256:76562238a20f2f4dd45770f00730e20425edd376d30d58d7dafb5d6f02b208c5: Get "https://virthost.ostest.test.metalkube.org:5000/v2/": dial tcp: lookup virthost.ostest.test.metalkube.org: no such host
Oct 25 12:47:03 master-0 systemd[1]: bootkube.service: Main process exited, code=exited, status=1/FAILURE
Oct 25 12:47:03 master-0 systemd[1]: bootkube.service: Failed with result 'exit-code'.

This hit OpenShift CI jobs 'okd-e2e-agent-compact-ipv4' and 'okd-e2e-agent-sno-ipv6' based on openshift-metal3/dev-scripts. An example would be a OCP cluster domain (which contains the cluster name) of `ostest.test.metalkube.org` and a disconnected image registry at `virthost.ostest.test.metalkube.org`.

Other diagnosis from the rendezvous host:

[core@master-0 ~]$ sudo podman pull virthost.ostest.test.metalkube.org:5000/localimages/local-release-image@sha256:76562238a20f2f4dd45770f00730e20425edd376d30d58d7dafb5d6f02b208c5
Trying to pull virthost.ostest.test.metalkube.org:5000/localimages/local-release-image@sha256:76562238a20f2f4dd45770f00730e20425edd376d30d58d7dafb5d6f02b208c5...
Error: initializing source docker://virthost.ostest.test.metalkube.org:5000/localimages/local-release-image@sha256:76562238a20f2f4dd45770f00730e20425edd376d30d58d7dafb5d6f02b208c5: pinging container registry virthost.ostest.test.metalkube.org:5000: Get "https://virthost.ostest.test.metalkube.org:5000/v2/": dial tcp: lookup virthost.ostest.test.metalkube.org: no such host

curl -u ocp-user:ocp-pass https://virthost.ostest.test.metalkube.org:5000/v2/_catalog 
curl: (6) Could not resolve host: virthost.ostest.test.metalkube.org

core@master-0 ~]$ dig +noall +answer virthost.ostest.test.metalkube.org
;; communications error to 127.0.0.1#53: connection refused
;; communications error to 127.0.0.1#53: connection refused
;; communications error to 127.0.0.1#53: connection refused
virthost.ostest.test.metalkube.org. 0 IN A      192.168.111.1

After stopping systemd-resolved:

[core@master-0 ~]$ curl -u ocp-user:ocp-pass https://virthost.ostest.test.metalkube.org:5000/v2/_catalog 
{"repositories":["localimages/installer","localimages/local-release-image"]}

Report and diagnosis output above from Andrea Fasano.

https://github.com/openshift/installer/pull/7634

Bug OCPBUGS-23912: add missing vulnerabilities column and Signed icon in PAC repository PLR list

View the Description View the linked PRs

Description of problem:

add missing vulnerabilities column and Signed icon in PAC repository PLR list. Same as what we have in PipelineRuns list page

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13364

Bug OCPBUGS-23742: Bump cluster-ingress-operator to Kubernetes 1.28 for 4.15

View the Description View the linked PRs

Description of problem

The cluster-ingress-operator repository vendors controller-runtime v0.15.0, which uses Kubernetes 1.27 packages. OpenShift 4.15 is based on Kubernetes 1.28.

Version-Release number of selected component (if applicable)

4.15.

How reproducible

Always.

Steps to Reproduce

Check https://github.com/openshift/cluster-ingress-operator/blob/release-4.15/go.mod.

Actual results

The sigs.k8s.io/controller-runtime package is at v0.15.0.

Expected results

The sigs.k8s.io/controller-runtime package is at v0.16.0 or newer.

Additional info

https://github.com/openshift/cluster-ingress-operator/pull/990 already bumped the k8s.io/* packages to v0.28.2, but ideally the controller-runtime package should be bumped too. The controller-runtime v0.16 release includes some breaking changes; see the release notes at https://github.com/kubernetes-sigs/controller-runtime/releases/tag/v0.16.0.

https://github.com/openshift/cluster-ingress-operator/pull/1001

Bug OCPBUGS-24155: Update 4.15 ose-prometheus-adapter-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/k8s-prometheus-adapter/pull/95

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/k8s-prometheus-adapter/pull/95

Bug OCPBUGS-18857: Update 4.15 ose-cluster-samples-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-samples-operator/pull/517

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-samples-operator/pull/517

Bug OCPBUGS-19272: Update 4.15 ose-powervs-cloud-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-powervs/pull/43

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-powervs/pull/43

Bug OCPBUGS-19332: Inconsistent impersonation in hypershift dump

View the Description View the linked PRs

Description of problem:

hypershift dump fails to acquire localhost-kubeconfig when impersonating. When attempting to dump guest cluster, it fails to read Secrets from the HCP namespace on the management cluster. As a result, it can't access anything from the guest cluster and fails to dump it successfully.

Version-Release number of selected component (if applicable):

Hypershift 0.1.11
Supported OCP version 4.15.0

How reproducible:

100%

Steps to Reproduce:

Execute hypershift dump cluster --as backplane-cluster-admin --name ${CLUSTER_NAME} --namespace ocm-${ENVIRONMENT}-${CLUSTER_ID} --dump-guest-cluster  --artifact-dir ${DIR_NAME}

Actual results:

After a while a failure message will appear showing permission issue when attempting to acquire localhost-kubeconfig

Expected results:

localhost-kubeconfig should be acquired correctly and dump should be able to dump the guest cluster successfully

Additional info:

https://github.com/openshift/hypershift/pull/3011

Bug OCPBUGS-20205: origin test suite should not assume a local image registry

View the Description View the linked PRs

Description of problem

ImageRegistry became a new optional component in 4.14 (docs#64469, api#1572). And even before that, it has long been configurable for managementState: Removed. However the no-capabilities test is currently failing like:

message: Back-off pulling image "image-registry.openshift-image-registry.svc:5000/openshift/tools:latest"

in clusters without a local registry. We should teach the origin suite to be more forgiving of a lack of internal registry.

Version-Release number of selected component (if applicable):

4.14 and 4.15. But possibly 4.14 is now stable enough about 4.14 no-capabilities jobs to backport any fixes.

How reproducible:

100%

Steps to Reproduce:

1. Open a recent 4.15 no-cap run and see if it passed.

Actual results:

Lots of test-cases failing to pull from image-registry.openshift-image-registry.svc:5000 , which isn't expected to exist for these clusters, where the ImageRegistry capability is not requested.

Expected results:

Passing CI test-cases .

Additional info:

I'm fuzzy on the relationship between ImageStreams and the local image registry, but at the moment, the tools ImageStreams and such are still part of no-caps runs:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-aws-ovn-no-capabilities/1709539450616287232/artifacts/e2e-aws-ovn-no-capabilities/gather-must-gather/artifacts/must-gather.tar | tar xOz 31e3c46d361008f321d02ef278f62b1fc4e5510a9902c8ac16de5b2078fed849/namespaces/openshift/image.openshift.io/imagestreams.yaml | yaml2json | jq -r '.items[] | select(.metadata.name == "tools").status'
{
  "dockerImageRepository": "",
  "tags": [
    {
      "items": [
        {
          "created": "2023-10-04T12:24:42Z",
          "dockerImageReference": "registry.ci.openshift.org/ocp/4.15-2023-10-04-015153@sha256:a83089cbb8a8f4ef868e5f37de5d305c10056e4e9761ad37b7c1ab98f465a553",
          "generation": 2,
          "image": "sha256:a83089cbb8a8f4ef868e5f37de5d305c10056e4e9761ad37b7c1ab98f465a553"
        }
      ],
      "tag": "latest"
    }
  ]
}

https://github.com/openshift/origin/pull/28307

Bug OCPBUGS-25996: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7898

Bug OCPBUGS-19019: Ironic: Invalid cross-device link

View the Description View the linked PRs

Using metal-ipi with okd-scos ironic fails to provision nodes

https://github.com/openshift/ironic-image/pull/398

Bug OCPBUGS-25324: Last visited tab not get selected on Pipelines page in dev perspective

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13443

Bug OCPBUGS-25812: [OCP 4.15] VM stuck in terminating state after OCP node crash

View the Description View the linked PRs

Description of problem:

After a manual crash of a OCP node the OSPD VM running on the OCP node is stuck in terminating state

Version-Release number of selected component (if applicable):

OCP 4.12.15 
osp-director-operator.v1.3.0
kubevirt-hyperconverged-operator.v4.12.5

How reproducible:

Login to a OCP 4.12.15 Node running a VM 
Manually crash the master node.
After reboot the VM stay in terminating state

Steps to Reproduce:

    1. ssh core@masterX 
    2. sudo su
    3. echo c > /proc/sysrq-trigger

Actual results:

After reboot the VM stay in terminating state


$ omc get node|sed -e 's/modl4osp03ctl/model/g' | sed -e 's/telecom.tcnz.net/aaa.bbb.ccc/g'
NAME                               STATUS   ROLES                         AGE   VERSION
model01.aaa.bbb.ccc   Ready    control-plane,master,worker   91d   v1.25.8+37a9a08
model02.aaa.bbb.ccc   Ready    control-plane,master,worker   91d   v1.25.8+37a9a08
model03.aaa.bbb.ccc   Ready    control-plane,master,worker   91d   v1.25.8+37a9a08


$ omc get pod -n openstack 
NAME                                                        READY   STATUS         RESTARTS   AGE
openstack-provision-server-7b79fcc4bd-x8kkz                 2/2     Running        0          8h
openstackclient                                             1/1     Running        0          7h
osp-director-operator-controller-manager-5896b5766b-sc7vm   2/2     Running        0          8h
osp-director-operator-index-qxxvw                           1/1     Running        0          8h
virt-launcher-controller-0-9xpj7                            1/1     Running        0          20d
virt-launcher-controller-1-5hj9x                            1/1     Running        0          20d
virt-launcher-controller-2-vhd69                            0/1     NodeAffinity   0          43d

$ omc describe  pod virt-launcher-controller-2-vhd69 |grep Status:
Status:                    Terminating (lasts 37h)

$ xsos sosreport-xxxx/|grep time
...
  Boot time: Wed Nov 22 01:44:11 AM UTC 2023
  Uptime:    8:27,  0 users

Expected results:

VM restart automatically OR does not stay in Terminating state

Additional info:

The issue has been seen two time.

First time, a crash of the kernel occured and we had the associated VM on the node in terminating state

Second time we try to reproduce the issue by crashing manually the kernel and we got the same result.
The VM running on the OCP node stay in terminating state

https://github.com/openshift/kubernetes/pull/1832

Bug OCPBUGS-22152: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-vsphere/pull/22

Bug OCPBUGS-24147: Update 4.15 ose-cluster-bootstrap-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-bootstrap/pull/101

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-bootstrap/pull/101

Bug OCPBUGS-24203: Metrics: ConsolePlugins must no longer needs to be grouped

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

Task MON-2642: Write e2e tests for the alertrelabelconfigs CRD

View the Description View the linked PRs

OCP 4.11 ships the alertrelabelconfigs CRD as a techpreview feature. Before graduating to GA we need to have e2e tests in the CMO repository.

AC:

End-to-end tests in the CMO repository validating
- Create/update/delete of alertingrules
- Invalid resources are rejected
Configuration of a blocking job in openshift/release.

https://github.com/openshift/cluster-monitoring-operator/pull/2080

Bug OCPBUGS-19501: Add additonal certificate acceptance condition feature in ovnkube-identity

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/ovn-kubernetes/pull/1895

Bug ACM-7713: Must Gather does not get ConfigurationPolicies

View the Description View the linked PRs

Description of problem:

When looking at an ACM must-gather for a managed cluster, no information for the ConfigurationPolicies can be seen. It appears that this command in the must-gather script has an error:

oc adm inspect configurationpolicies.policy.open-cluster-management.io --all-namespaces  --dest-dir=must-gather

The error (which is not logged in the must-gather itself...) looks like:

error: errors ocurred while gathering data:
    skipping gathering  due to error: the server doesn't have a resource type ""

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

ConfigurationPolicy YAML should be collected in the must-gather to help in debugging.

Additional info:

https://github.com/openshift/oc/pull/1550

Bug OCPBUGS-15201: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/1981

Bug OCPBUGS-25782: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3321

Bug OCPBUGS-26759: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-shared-resource/pull/162

Bug OCPBUGS-19170: Update 4.15 azure-file-csi-driver-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/azure-file-csi-driver-operator/pull/74

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/azure-file-csi-driver-operator/pull/74

Bug OCPBUGS-23010: Alibaba volume snapshot never become ready

View the Description View the linked PRs

Description of problem:

On Alibaba, some volume snapshot never become ready.

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-11-06-182702

How reproducible: sometimes

Steps to Reproduce:

Create PVC + Pod
Create VolumeSnapshot of the PVC
Observe that the VolumeSnapshot never becomes "ready".

Actual results:

$ oc get volumesnapshot
NAME          READYTOUSE   SOURCEPVC   ...
mysnapl587m   false         myclaim     ...

Expected results:

The VolumeSnapshot becomes ready in ~1 minute or less (for small volumes)

Additional info:

There seems to be something odd between the external-snapshotter and the CSI driver. From the snapshotter logs:

the external-snapshotter calls initial CreateSnapshot and gets an unready snapshot (like "readyToUse [false]").
the snapshotter calls CreateSnapshot again and gets an error (Alibaba CSI driver has some throttling). This happens few times in sequence.
Finally, the snapshotter calls CreateSnapshot and get unready snapshot again instead of the throttling error. At this point, the snapshotter stops and does not continue calling CreateSnapshot to get ready snapshot.

This sequence is very timing sensitive - sometimes it happens that the cloud finishes the snapshot at step 2., therefore the driver gets snapshot that is ready at step 3. and then everything works OK.

(Sorry, I lost the full logs...)

https://github.com/openshift/csi-external-snapshotter/pull/114

Bug OCPBUGS-23769: After PatternFly5 update: Typology list view hover state is incorrect

View the Description View the linked PRs

Issue 43 from https://docs.google.com/spreadsheets/d/1TR3ENY-GE_LQL9F-xH6NahtRHdu6IrYGhLqDDA8y0EI/edit#gid=1035185624

The active and hover states for the typology list view is incorrect

Screenshot: https://drive.google.com/file/d/1DMwmYsvdHXvMBYr0gOD9mActmJNMaH6z/view?usp=share_link

https://github.com/openshift/console/pull/13368

Bug OCPBUGS-24163: Update 4.15 ose-aws-pod-identity-webhook-container image to be consistent with ART

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/aws-pod-identity-webhook/pull/179

Bug OCPBUGS-26758: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-shared-resource/pull/162

Bug OCPBUGS-15215: OAuth template config in HostedCluter.configuration.oauth is not honored in HyperShift

View the Description View the linked PRs

Description of problem:

Standalone OpenShift allows customizing templates for OAuth via the oauth.config.openshift.io/cluster resource. In HyperShift, this is done via the HostedCluster.spec.configuration.oauth field. However, setting a reference to secrets in these fields does not take effect on a HyperShift cluster.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

always

Steps to Reproduce:

1.Create a HostedCluster and specify alternate templates for oauth via the HostedCluster.spec.configuration.oauth field.
2. View the oauth UI by attempting to log in to the OpenShift console.
3.

Actual results:

Different oauth templates do not take effect

Expected results:

Templates affect the look of the oauth login page

Additional info:

https://github.com/openshift/hypershift/pull/3041

Bug OCPBUGS-17654: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes/pull/1665

Bug OCPBUGS-22772: segmentation violation code=0x1, github.com/openshift/installer/pkg/asset.PersistToFile

View the Description View the linked PRs

Description of problem:

level=error198level=error msg=Error: waiting for EC2 Instance (i-054a010f3e99f7a2c) create: timeout while waiting for state to become 'running' (last state: 'pending', timeout: 10m0s)199level=error200level=error msg=  with module.masters.aws_instance.master[2],201level=error msg=  on master/main.tf line 136, in resource "aws_instance" "master":202level=error msg= 136: resource "aws_instance" "master" {203level=error204panic: runtime error: invalid memory address or nil pointer dereference205[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1936dcc]206207goroutine 1 [running]:208github.com/openshift/installer/pkg/asset.PersistToFile({0x22860140?, 0x277372f0?}, {0x7ffc102e22db, 0xe})209	/go/src/github.com/openshift/installer/pkg/asset/asset.go:57 +0xac210github.com/openshift/installer/pkg/asset.(*fileWriterAdapter).PersistToFile(0x227fa3e0?, {0x7ffc102e22db?, 0x277372f0?})211	/go/src/github.com/openshift/installer/pkg/asset/filewriter.go:19 +0x31212main.runTargetCmd.func1({0x7ffc102e22db, 0xe})213	/go/src/github.com/openshift/installer/cmd/openshift-install/create.go:277 +0x24a214main.runTargetCmd.func2(0x275d0340?, {0xc0007a6d00?, 0x1?, 0x1?})215	/go/src/github.com/openshift/installer/cmd/openshift-install/create.go:302 +0xe7216github.com/spf13/cobra.(*Command).execute(0x275d0340, {0xc0007a6cc0, 0x1, 0x1})217	/go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:920 +0x847218github.com/spf13/cobra.(*Command).ExecuteC(0xc000956000)219	/go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:1040 +0x3bd220github.com/spf13/cobra.(*Command).Execute(...)221	/go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:968222main.installerMain()223	/go/src/github.com/openshift/installer/cmd/openshift-install/main.go:56 +0x2b0224main.main()225	/go/src/github.com/openshift/installer/cmd/openshift-install/main.go:33 +0xff226Installer exit with code 2

Version-Release number of selected component (if applicable):

4.15

How reproducible:

I noticed it on a presubmit

Steps to Reproduce:

1.Run pull-ci-openshift-origin-master-e2e-aws-ovn-fips job on openshift/origin repo presubmit 2.
3.

Actual results:

Expected results:

Additional info:

Example where it occurred: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/28372/pull-ci-openshift-origin-master-e2e-aws-ovn-fips/1719449092209250304

This shows it happed on several jobs: https://search.ci.openshift.org/?search=asset.PersistToFile&maxAge=48h&context=1&type=build-log&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

https://github.com/openshift/installer/pull/7671

Bug OCPBUGS-17279: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-aws/pull/478

Bug OCPBUGS-19237: Update 4.15 cluster-monitoring-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-monitoring-operator/pull/2084

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-monitoring-operator/pull/2084

Bug OCPBUGS-19625: Multus per node certificates: CNO integration

View the Description View the linked PRs

Description of problem: Multus should implement per node certificates via integration in the CNO

https://github.com/openshift/cluster-network-operator/pull/2009

Bug OCPBUGS-21736: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/baremetal-operator/pull/313

Bug OCPBUGS-21633: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/prometheus/pull/173

Bug OCPBUGS-25938: Update downstream OWNERS to include Surya

View the Description View the linked PRs

This is a clone of issue OCPBUGS-25810. The following is the description of the original issue:
—
No QA required, updating approvers across releases

https://github.com/openshift/ovn-kubernetes/pull/2005

Bug OCPBUGS-26516: Bump Helm version to 3.13 in ODC in release branch 4.15

View the Description View the linked PRs

Story (Required)
As an ODC helm backend developer I would like to be able to bump version of helm to 3.13 to stay synched up with the version we will ship with OCP 4.15

Background (Required)
Normal activity we do every time a new OCP version is release to stay current

Glossary
NA

Out of scope
NA

Approach(Required)
Bump version of helm to 3.13 run, build and unit test and make sure everything is working as expected. Last time we had a conflict with DevFile backend.

Dependencies
Might had dependencies with DevFile team to move some dependencies forward

https://github.com/openshift/console/pull/13465

Bug OCPBUGS-21631: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/2120

Bug OCPBUGS-21775: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/multus-admission-controller/pull/70

Bug OCPBUGS-5491: [Azure] EgressIP cannot be applied to the egress node on Azure private cluster

View the Description View the linked PRs

Description of problem:

The issue was found in ci, and it is an Azure private cluster, all the egressIP cases failed due to  EgressIP cannot be applied to the egress node. It was able to be reproduced manually.

Version-Release number of selected component (if applicable):

4.12.0-0.nightly-2023-01-08-142418

How reproducible:

Always

Steps to Reproduce:

1. Label one worker node as egress node
2. Create one egressIP object
3.

Actual results:

% oc get egressip
NAME             EGRESSIPS    ASSIGNED NODE   ASSIGNED EGRESSIPS
egressip-2       10.0.1.10                    
egressip-47164   10.0.1.217 

% oc get cloudprivateipconfig 
NAME         AGE
10.0.1.10    18m
10.0.1.217   22m
% oc get cloudprivateipconfig  -o yaml
apiVersion: v1
items:
- apiVersion: cloud.network.openshift.io/v1
  kind: CloudPrivateIPConfig
  metadata:
    annotations:
      k8s.ovn.org/egressip-owner-ref: egressip-2
    creationTimestamp: "2023-01-09T10:11:33Z"
    finalizers:
    - cloudprivateipconfig.cloud.network.openshift.io/finalizer
    generation: 1
    name: 10.0.1.10
    resourceVersion: "59723"
    uid: d697568a-7d7c-471a-b5e1-d7b814244549
  spec:
    node: huirwang-0109b-bv4ld-worker-eastus1-llmpb
  status:
    conditions:
    - lastTransitionTime: "2023-01-09T10:17:06Z"
      message: 'Error processing cloud assignment request, err: network.InterfacesClient#CreateOrUpdate:
        Failure sending request: StatusCode=0 -- Original Error: Code="OutboundRuleCannotBeUsedWithBackendAddressPoolThatIsReferencedBySecondaryIpConfigs"
        Message="OutboundRule /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/huirwang-0109b-bv4ld-rg/providers/Microsoft.Network/loadBalancers/huirwang-0109b-bv4ld/outboundRules/outbound-rule-v4
        cannot be used with Backend Address Pool /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/huirwang-0109b-bv4ld-rg/providers/Microsoft.Network/loadBalancers/huirwang-0109b-bv4ld/backendAddressPools/huirwang-0109b-bv4ld
        that contains Secondary IPConfig /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/huirwang-0109b-bv4ld-rg/providers/Microsoft.Network/networkInterfaces/huirwang-0109b-bv4ld-worker-eastus1-llmpb-nic/ipConfigurations/huirwang-0109b-bv4ld-worker-eastus1-llmpb_10.0.1.10"
        Details=[]'
      observedGeneration: 1
      reason: CloudResponseError
      status: "False"
      type: Assigned
    node: huirwang-0109b-bv4ld-worker-eastus1-llmpb
- apiVersion: cloud.network.openshift.io/v1
  kind: CloudPrivateIPConfig
  metadata:
    annotations:
      k8s.ovn.org/egressip-owner-ref: egressip-47164
    creationTimestamp: "2023-01-09T10:07:56Z"
    finalizers:
    - cloudprivateipconfig.cloud.network.openshift.io/finalizer
    generation: 1
    name: 10.0.1.217
    resourceVersion: "58333"
    uid: 6a7d6196-cfc9-4859-9150-7371f5818b74
  spec:
    node: huirwang-0109b-bv4ld-worker-eastus1-llmpb
  status:
    conditions:
    - lastTransitionTime: "2023-01-09T10:13:29Z"
      message: 'Error processing cloud assignment request, err: network.InterfacesClient#CreateOrUpdate:
        Failure sending request: StatusCode=0 -- Original Error: Code="OutboundRuleCannotBeUsedWithBackendAddressPoolThatIsReferencedBySecondaryIpConfigs"
        Message="OutboundRule /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/huirwang-0109b-bv4ld-rg/providers/Microsoft.Network/loadBalancers/huirwang-0109b-bv4ld/outboundRules/outbound-rule-v4
        cannot be used with Backend Address Pool /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/huirwang-0109b-bv4ld-rg/providers/Microsoft.Network/loadBalancers/huirwang-0109b-bv4ld/backendAddressPools/huirwang-0109b-bv4ld
        that contains Secondary IPConfig /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/huirwang-0109b-bv4ld-rg/providers/Microsoft.Network/networkInterfaces/huirwang-0109b-bv4ld-worker-eastus1-llmpb-nic/ipConfigurations/huirwang-0109b-bv4ld-worker-eastus1-llmpb_10.0.1.217"
        Details=[]'
      observedGeneration: 1
      reason: CloudResponseError
      status: "False"
      type: Assigned
    node: huirwang-0109b-bv4ld-worker-eastus1-llmpb
kind: List
metadata:
  resourceVersion: ""

Expected results:

EgressIP can be applied correctly

Additional info:

https://github.com/openshift/cloud-network-config-controller/pull/121

Bug OCPBUGS-21671: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oauth-server/pull/137

Bug TRT-1368: 4.15 Nightly Payloads Failing on GCP Credentials Quota

View the Description View the linked PRs

Hit seemingly every job in the last payload:

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-gcp-ovn-rt-upgrade/1728713834463498240

Credentials request shows:

"conditions": [                                                                                                
      {                                                                                                            
        "lastProbeTime": "2023-11-26T11:20:40Z",                                                                   
        "lastTransitionTime": "2023-11-26T11:20:40Z",
        "message": "failed to grant creds: error syncing creds in mint-mode: error creating custom role: rpc error: code = ResourceExhausted desc = Maximum number of roles reached. Maximum is: 300\nerror details: retry in 24h0m1s",
        "reason": "CredentialsProvisionFailure",
        "status": "True",                             
        "type": "CredentialsProvisionFailure"         
      }                                         
    ],

We've heard a new gcp account is live, but we're not sure if these are landing in it or not. Perhaps they are and a limit needs to be bumped?

Additional info

This issue shows up as a Cluster Version Operator component readiness regression due to failing the following tests:

[sig-cluster-lifecycle] Cluster completes upgrade
[sig-arch][Feature:ClusterUpgrade] Cluster should be upgradeable after finishing upgrade [Late][Suite:upgrade]
[sig-arch][Feature:ClusterUpgrade] Cluster should remain functional during upgrade [Disruptive] [Serial]

https://github.com/openshift/cluster-image-registry-operator/pull/965

Bug OCPBUGS-19222: Update 4.15 cluster-version-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-version-operator/pull/970

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-version-operator/pull/970

Bug OCPBUGS-21853: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-24108: Update 4.15 baremetal-machine-controller-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-baremetal/pull/205

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-baremetal/pull/205

Bug OCPBUGS-24301: capo-controller-manager enters crash backoff loop

View the Description View the linked PRs

cluster-capi-operator is incorrectly updating the container command to /bin/cluster-api-provider-openstack-manager. It should leave it alone because it is already correct.

https://github.com/openshift/cluster-capi-operator/pull/148

Bug OCPBUGS-17851: CPMS assumes diff on empty zone in Azure

View the Description View the linked PRs

Description of problem:


When we merged https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/229, it changed the way failure domains were injected for Azure so that additional fields could be accounted for. However, the CPMS failure domains have Azure zones as a string (which they should be) and the machine v1beta1 spec has them as a string pointer.

This means now that the CPMS is detecting the difference between the a nil zone and an empty string, even though every other piece of code in openshift treats them the same.

We should update the machine v1beta1 type to remove the pointer. This will be a no-op in terms of the data stored in etcd since the type is unstructured anyway.

It will then require updates to the MAPZ, CPMS, MAO and installer repositories to update their generation.

Version-Release number of selected component (if applicable):

4.14 nightlies from the merge of 229 onwards

How reproducible:

This is only affecting regions in Azure where there are no zones, currently in CI it's affecting about 20% of events.

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/234

Bug OCPBUGS-19710: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/operator-framework/operator-marketplace/pull/541

Bug OCPBUGS-23473: [4.15] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.15. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-22757~~.

https://github.com/openshift/installer/pull/7770

Bug OCPBUGS-23918: Infinite network call to tekton-results API in PAC repository list and details page

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Install Pipeline operator and setup tekton-results on the cluster
    2. Create a PAC repository and trigger a PLR
    3. open network tab and visit Repository list page

Actual results:

    infinite internet API call

Expected results:

    internet API call should not get call continuously

Additional info:

https://github.com/openshift/console/pull/13364

Bug OCPBUGS-16666: Move the setting of additionalTrustBundle to InfraEnv

View the Description View the linked PRs

Description of problem:

In https://github.com/openshift/installer/pull/7182 support was added to include AdditionalTrustBundle in the installconfigOverride for assisted-service in order to support Proxy with AdditionalTrustBundle. With the recent change to assisted-service https://github.com/openshift/assisted-service/pull/5357 to add it to the API we can remove setting this in installconfigOverride.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7485

Bug OCPBUGS-20474: Mirroring a manifest-list-based release payload with --to-image-stream uses Legacy importMode and does not honor --keep-manifest-list

View the Description View the linked PRs

Description of problem:

When mirroring a multiarch release payload through oc adm release mirror --keep-manifest-list --to-image-stream into an image stream of a cluster's internal registry, the cluster does not import the image as a manifest list.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always

Steps to Reproduce:

1. oc adm release mirror \
                  --from=quay.io/openshift-release-dev/ocp-release:4.14.0-rc.5-multi \
                  --to-image-stream=release \
                  --keep-manifest-list=true
2. oc get istag release:installer -o yaml
3.

Actual results:

apiVersion: image.openshift.io/v1
generation: 1
image:
  dockerImageLayers:
  - mediaType: application/vnd.docker.image.rootfs.diff.tar.gzip
    name: sha256:97da74cc6d8fa5d1634eb1760fd1da5c6048619c264c23e62d75f3bf6b8ef5c4
    size: 79524639
  - mediaType: application/vnd.docker.image.rootfs.diff.tar.gzip
    name: sha256:d8190195889efb5333eeec18af9b6c82313edd4db62989bd3a357caca4f13f0e
    size: 1438
  - mediaType: application/vnd.docker.image.rootfs.diff.tar.gzip
    name: sha256:09c3f3b6718f2df2ee9cd3a6c2e19ddb73ca777f216d310eaf4e0420407ea7c7
    size: 59044444
  - mediaType: application/vnd.docker.image.rootfs.diff.tar.gzip
    name: sha256:cf84754d71b4b704c30abd45668882903e3eaa1355857b605e1dbb25ecf516d7
    size: 11455659
  - mediaType: application/vnd.docker.image.rootfs.diff.tar.gzip
    name: sha256:2e20a50f4b685b3976028637f296ae8839c18a9505b5f58d6e4a0f03984ef1e8
    size: 433281528
  dockerImageManifestMediaType: application/vnd.docker.distribution.manifest.v2+json
  dockerImageMetadata:
    Architecture: amd64
    Config:
      Entrypoint:
      - /bin/openshift-install
      Env:
      - container=oci
      - GODEBUG=x509ignoreCN=0,madvdontneed=1
      - __doozer=merge
      - BUILD_RELEASE=202310100645.p0.gc926532.assembly.stream
      - BUILD_VERSION=v4.15.0
      - OS_GIT_MAJOR=4
      - OS_GIT_MINOR=15
      - OS_GIT_PATCH=0
      - OS_GIT_TREE_STATE=clean
      - OS_GIT_VERSION=4.15.0-202310100645.p0.gc926532.assembly.stream-c926532
      - SOURCE_GIT_TREE_STATE=clean
      - __doozer_group=openshift-4.15
      - __doozer_key=ose-installer
      - OS_GIT_COMMIT=c926532
      - SOURCE_DATE_EPOCH=1696907019
      - SOURCE_GIT_COMMIT=c926532cd50b6ef4974f14dfe3d877a0f7707972
      - SOURCE_GIT_TAG=agent-installer-v4.11.0-dev-preview-2-2165-gc926532cd5
      - SOURCE_GIT_URL=https://github.com/openshift/installer
      - PATH=/bin
      - HOME=/output
      Labels:
        License: GPLv2+
        architecture: x86_64
        build-date: 2023-10-10T10:01:18
        com.redhat.build-host: cpt-1001.osbs.prod.upshift.rdu2.redhat.com
        com.redhat.component: ose-installer-container
        com.redhat.license_terms: https://www.redhat.com/agreements
        description: This is the base image from which all OpenShift Container Platform
          images inherit.
        distribution-scope: public
        io.buildah.version: 1.29.0
        io.k8s.description: This is the base image from which all OpenShift Container
          Platform images inherit.
        io.k8s.display-name: OpenShift Container Platform RHEL 8 Base
        io.openshift.build.commit.id: c926532cd50b6ef4974f14dfe3d877a0f7707972
        io.openshift.build.commit.url: https://github.com/openshift/installer/commit/c926532cd50b6ef4974f14dfe3d877a0f7707972
        io.openshift.build.source-location: https://github.com/openshift/installer
        io.openshift.expose-services: ""
        io.openshift.maintainer.component: Installer / openshift-installer
        io.openshift.maintainer.project: OCPBUGS
        io.openshift.release.operator: "true"
        io.openshift.tags: openshift,base
        maintainer: Red Hat, Inc.
        name: openshift/ose-installer
        release: 202310100645.p0.gc926532.assembly.stream
        summary: Provides the latest release of the Red Hat Extended Life Base Image.
        url: https://access.redhat.com/containers/#/registry.access.redhat.com/openshift/ose-installer/images/v4.15.0-202310100645.p0.gc926532.assembly.stream
        vcs-ref: d40a2800e169f6c2d63897467af22d59933e8811
        vcs-type: git
        vendor: Red Hat, Inc.
        version: v4.15.0
      User: 1000:1000
      WorkingDir: /output
    ContainerConfig: {}
    Created: "2023-10-10T10:59:36Z"
    Id: sha256:ae4c47d3c08de5d57b5d4fa8a30497ac097c05abab4e284c91eae389e512f202
    Size: 583326767
    apiVersion: image.openshift.io/1.0
    kind: DockerImage
  dockerImageMetadataVersion: "1.0"
  dockerImageReference: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:67d35b2185c9f267523f86e54f403d0d2561c9098b7bb81fa3bfd6fd8a121d04
  metadata:
    annotations:
      image.openshift.io/dockerLayersOrder: ascending
    creationTimestamp: "2023-10-11T10:56:53Z"
    name: sha256:67d35b2185c9f267523f86e54f403d0d2561c9098b7bb81fa3bfd6fd8a121d04
    resourceVersion: "740341"
    uid: 17dede63-ca3b-47ad-a157-c78f38c1df7d
kind: ImageStreamTag
lookupPolicy:
  local: true
metadata:
  creationTimestamp: "2023-10-12T09:32:10Z"
  name: release:installer
  namespace: okd-fcos
  resourceVersion: "1329147"
  uid: d6cfcd4d-3f9c-4bb1-bc56-04bf5e926628
tag:
  annotations: null
  from:
    kind: DockerImage
    name: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c510f0e2bd29f7b9bf45146fbc212e893634179cc029cd54a135f05f9ae1df52
  generation: 12
  importPolicy:
    importMode: Legacy
  name: installer
  referencePolicy:
    type: Source

Expected results:

apiVersion: image.openshift.io/v1
generation: 12
image:
  dockerImageManifestMediaType: application/vnd.docker.distribution.manifest.list.v2+json
  dockerImageManifests:
  - architecture: amd64
    digest: sha256:67d35b2185c9f267523f86e54f403d0d2561c9098b7bb81fa3bfd6fd8a121d04
    manifestSize: 1087
    mediaType: application/vnd.docker.distribution.manifest.v2+json
    os: linux
  - architecture: arm64
    digest: sha256:a602c3e4b5f8f747b2813ed2166f366417f638fc6884deecebdb04e18431fcd6
    manifestSize: 1087
    mediaType: application/vnd.docker.distribution.manifest.v2+json
    os: linux
  - architecture: ppc64le
    digest: sha256:04296057a8f037f20d4b1ca20bcaac5bdca5368cdd711a3f37bd05d66c9fdaec
    manifestSize: 1087
    mediaType: application/vnd.docker.distribution.manifest.v2+json
    os: linux
  - architecture: s390x
    digest: sha256:5fda4ea09bfd2026b7d6acd80441b2b7c51b1cf440fd46e0535a7320b67894fb
    manifestSize: 1087
    mediaType: application/vnd.docker.distribution.manifest.v2+json
    os: linux
  dockerImageMetadata:
    ContainerConfig: {}
    Created: "2023-10-12T09:32:03Z"
    Id: sha256:c510f0e2bd29f7b9bf45146fbc212e893634179cc029cd54a135f05f9ae1df52
    apiVersion: image.openshift.io/1.0
    kind: DockerImage
  dockerImageMetadataVersion: "1.0"
  dockerImageReference: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c510f0e2bd29f7b9bf45146fbc212e893634179cc029cd54a135f05f9ae1df52
  metadata:
    creationTimestamp: "2023-10-12T09:32:10Z"
    name: sha256:c510f0e2bd29f7b9bf45146fbc212e893634179cc029cd54a135f05f9ae1df52
    resourceVersion: "1327949"
    uid: 4d78c9ba-12b2-414f-a173-b926ae019ab0
kind: ImageStreamTag
lookupPolicy:
  local: true
metadata:
  creationTimestamp: "2023-10-12T09:32:10Z"
  name: release:installer
  namespace: okd-fcos
  resourceVersion: "1329147"
  uid: d6cfcd4d-3f9c-4bb1-bc56-04bf5e926628
tag:
  annotations: null
  from:
    kind: DockerImage
    name: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c510f0e2bd29f7b9bf45146fbc212e893634179cc029cd54a135f05f9ae1df52
  generation: 12
  importPolicy:
    importMode: PreserveOriginal
  name: installer
  referencePolicy:
    type: Source

Additional info:

https://github.com/openshift/oc/pull/1572

Bug OCPBUGS-24069: Update 4.15 ose-multus-admission-controller-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/multus-admission-controller/pull/77

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/multus-admission-controller/pull/77

Bug OCPBUGS-19875: Console plugin requests show error message with 304 status and "request method or response status code does not allow body"

View the Description View the linked PRs

This issue has been updated to capture a larger ongoing issue around console 304 status responses for plugins. This has been observed for ODF, ACM, MCE, monitoring, and other plugins going back to 4.12. Related links:

Original report from this bug:

Description of problem:

find error logs under console pod logs

Version-Release number of selected component (if applicable):

% oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.15.0-0.nightly-2023-09-27-073353   True        False         37m     Cluster version is 4.15.0-0.nightly-2023-09-27-073353

How reproducible:

100% on ipv6 clusters

Steps to Reproduce:

1.% oc -n openshift-console logs console-6fbf69cc49-7jq5b
...
E0928 00:35:24.098808       1 handlers.go:172] GET request for "monitoring-plugin" plugin failed with 304 status code
E0928 00:35:24.098822       1 utils.go:43] Failed sending HTTP response body: http: request method or response status code does not allow body
E0928 00:35:39.611569       1 handlers.go:172] GET request for "monitoring-plugin" plugin failed with 304 status code
E0928 00:35:39.611583       1 utils.go:43] Failed sending HTTP response body: http: request method or response status code does not allow body
E0928 00:35:54.442150       1 handlers.go:172] GET request for "monitoring-plugin" plugin failed with 304 status code
E0928 00:35:54.442167       1 utils.go:43] Failed sending HTTP response body: http: request method or response status code does not allow body

Actual results:

GET request for "monitoring-plugin" plugin failed with 304 status code

Expected results:

no monitoring-plugin related error logs

https://github.com/openshift/console/pull/13272

Story WINC-692: Add `oc debug` functionality for Windows nodes

View the Description View the linked PRs

Description

Windows host process containers are in alpha, as of Kubernetes 1.22. With this new feature, it should be possible to add `oc debug` functionality for Windows nodes. This would help us as developers, and has the potential to be useful for debugging customer issues as well.

Acceptance Criteria

oc debug is usable with a specified debug image when ran against Windows nodes. For example a user can run `oc debug no/e2e-wm-fsxc8 --image=mcr.microsoft.com/powershell:lts-nanoserver-ltsc2022` against a Windows Server 2022 node, and will have a debug container running on the node.
This functionality is documented

https://github.com/openshift/oc/pull/1524

Bug OCPBUGS-19859: Multus annotation permissions: Certificate duration should be configurable

View the Description View the linked PRs

Description of problem: the per-node certificates should be a configurable duration

https://github.com/openshift/multus-cni/pull/191

Bug OCPBUGS-21734: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug MGMT-15306: [Staging] [BE] - adding vips manually and then change network to UMN - getting error from BE

View the Description View the linked PRs

Description of the problem:

In staging, BE 2.23.0 - adding API and Ingress VIPs manually, and then change network to UMN,. BE response with an error "User Managed Networking cannot be set with API VIP"
Had a talk with Nir Magnezi about this. we should add ability to BE to delete VIPs from DB, if api gets such a request
This is in continue to

MGMT-14416

~~MGMT-15117~~

How reproducible:

Steps to reproduce:

1. add api and ingress VIPs manually

2. Change network to UMN

Actual results:

Expected results:

https://github.com/openshift/assisted-service/pull/5462

Bug OCPBUGS-18141: disruption_tests: [sig-instrumentation] Prometheus metrics should be available after an upgrade failing

View the Description View the linked PRs

Description of problem:

I'm seeing Prometheus disruption failures in upgrade tests

Version-Release number of selected component (if applicable):

How reproducible:

Sporadically

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/28228

Bug OCPBUGS-19096: Update 4.15 ose-olm-operator-controller image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/operator-framework-operator-controller/pull/26

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/operator-framework-operator-controller/pull/27

Bug OCPBUGS-21718: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oauth-proxy/pull/266

Bug OCPBUGS-18641: [vsphere] dual-stack install fails nodes stuck in node.cloudprovider.kubernetes.io/uninitialized

View the Description View the linked PRs

Description of problem:

vSphere Dual-stack install fails in bootstrap.
All nodes are node.cloudprovider.kubernetes.io/uninitialized

cloud-controller-manager can't find the nodes?

I0906 15:05:22.922183       1 search.go:49] WhichVCandDCByNodeID called but nodeID is empty
E0906 15:05:22.922187       1 nodemanager.go:197] shakeOutNodeIDLookup failed. Err=nodeID is empty

Version-Release number of selected component (if applicable):

4.14.0-0.ci.test-2023-09-06-141839-ci-ln-98f4iqb-latest

How reproducible:

Always

Steps to Reproduce:

1. Install vSphere IPI with OVN Dual-stack

platform:
  vsphere:
    apiVIPs:
      - 192.168.134.3
      - fd65:a1a8:60ad:271c::200
    ingressVIPs:
      - 192.168.134.4
      - fd65:a1a8:60ad:271c::201
networking:
  networkType: OVNKubernetes
  machineNetwork:
  - cidr: 192.168.0.0/16
  - cidr: fd65:a1a8:60ad:271c::/64
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  - cidr: fd65:10:128::/56
    hostPrefix: 64
  serviceNetwork:
  - 172.30.0.0/16
  - fd65:172:16::/112

Actual results:

Install fails in bootstrap

Expected results:

Install succeeds

Additional info:

I0906 15:03:21.393629       1 search.go:69] WhichVCandDCByNodeID by UUID
I0906 15:03:21.393632       1 search.go:76] WhichVCandDCByNodeID nodeID: 421b78c3-f8bb-970c-781b-76827306e89e
I0906 15:03:21.406797       1 search.go:208] Found node 421b78c3-f8bb-970c-781b-76827306e89e
I0906 15:03:21.406816       1 search.go:210] Hostname: ci-ln-bllxr6t-c1627-5p7mq-master-2, UUID: 421b78c3-f8bb-970c-781b-76827306e89e
I0906 15:03:21.406830       1 nodemanager.go:159] Discovered VM using normal UUID format
I0906 15:03:21.416168       1 nodemanager.go:268] Adding Hostname: ci-ln-bllxr6t-c1627-5p7mq-master-2
I0906 15:03:21.416218       1 nodemanager.go:438] Adding Internal IP: 192.168.134.60
I0906 15:03:21.416229       1 nodemanager.go:443] Adding External IP: 192.168.134.60
I0906 15:03:21.416244       1 nodemanager.go:349] Found node 421b78c3-f8bb-970c-781b-76827306e89e
I0906 15:03:21.416266       1 nodemanager.go:351] Hostname: ci-ln-bllxr6t-c1627-5p7mq-master-2 UUID: 421b78c3-f8bb-970c-781b-76827306e89e
I0906 15:03:21.416278       1 instances.go:77] instances.NodeAddressesByProviderID() FOUND with 421b78c3-f8bb-970c-781b-76827306e89e
E0906 15:03:21.416326       1 node_controller.go:236] error syncing 'ci-ln-bllxr6t-c1627-5p7mq-master-2': failed to get node modifiers from cloud provider: provided node ip for node "ci-ln-bllxr6t-c1627-5p7mq-master-2" is not valid: failed to get node address from cloud provider that matches ip: fd65:a1a8:60ad:271c::70, requeuing
I0906 15:03:21.623573       1 instances.go:102] instances.InstanceID() CACHED with ci-ln-bllxr6t-c1627-5p7mq-master-1

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/279

Bug OCPBUGS-19129: Update 4.15 openshift-enterprise-cli image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/oc/pull/1542

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/oc/pull/1542

Bug OCPBUGS-22956: When build capability is disabled, ConfigObserver controller does not run

View the Description View the linked PRs

Description of problem:

ConfigObserver controller waits until the all given informers are marked as synced including the build informer. However, when build capability is disabled, that causes ConfigObserver's blockage and never runs.

This is likely only happening on 4.15 because capability watching mechanism was bound to ConfigObserver in 4.15.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Launch cluster-bot cluster via "launch 4.15.0-0.nightly-2023-11-05-192858,openshift/cluster-openshift-controller-manager-operator#315 no-capabilities"

Steps to Reproduce:

1.
2.
3.

Actual results:

ConfigObserver controller stuck in failure

Expected results:

ConfigObserver controller runs and successfully clear all deployer service accounts when deploymentconfig capability is disabled.

Additional info:

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/315

Bug OCPBUGS-17652: [alibabacloud] IPI installation on Alibabacloud cannot succeed, and zero control-plane node ready

View the Description View the linked PRs

Description of problem:

IPI installation on Alibabacloud cannot succeed, and zero control-plane node ready.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-11-055332

How reproducible:

Always

Steps to Reproduce:

1. IPI installation on Alibabacloud, with "credentialsMode: Manual"

Actual results:

Bootstrap failed, with all control-plane nodes NotReady.

Expected results:

The installation should succeed.

Additional info:

The log bundle is available at https://drive.google.com/file/d/1eb1D6GeNyu1Bys6vDyf3ev9aFjzWW6lW/view?usp=drive_link.

The installation of exactly the same scenario can succeed with 4.14.0-ec.4-x86_64.

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/276

Bug OCPBUGS-5471: Installer can choose a worker as Node 0

View the Description View the linked PRs

If the user does not specify a rendezvousIP and instead leaves it to the installer to choose one of the configured static IPs, it always picks the lowest IP. If no roles are assigned, this host will become part of the control plane.

If the user assigns the lowest IP to a host to which they also assign a worker role, the install will fail.

It's not clear what will happen if the role is not explicitly set on the host with the lowest IP, but there are already sufficient control plane nodes assigned from among the other hosts. In any event, this wouldn't be good.

We should select a static IP among only the hosts that are eligible to become part of the control plane.

A user can work around this by explicitly specifying the rendezvousIP.

https://github.com/openshift/installer/pull/7443

Bug OCPBUGS-24668: [release-4.15] VPAs from different projects are shown under one deployment "Resources" tab

View the Description View the linked PRs

This is a clone of issue OCPBUGS-23925. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-15934: logSizeMax automatically applied to containerRuntimeConfig even if not specified

View the Description View the linked PRs

Description of problem:

According to https://docs.openshift.com/container-platform/4.11/release_notes/ocp-4-11-release-notes.html#ocp-4-11-deprecated-features-crio-parameters and Red Hat Insights, logSizeMax is deprecated in ContainerRuntimeConfig and shall instead be created via containerLogMaxSize in KubeletConfig.

When starting that transition though, it was noticed that a ContainerRuntimeConfig as shown below, would still add logSizeMax and even overlaySize to the ContainerRuntimeConfig spec.

$ bat /tmp/crio.yaml 
apiVersion: machineconfiguration.openshift.io/v1
kind: ContainerRuntimeConfig
metadata:
 name: pidlimit
spec:
 machineConfigPoolSelector:
   matchLabels:
     pools.operator.machineconfiguration.openshift.io/worker: '' 
 containerRuntimeConfig:
   pidsLimit: 4096 
   logLevel: debug

$ oc get containerruntimeconfig  pidlimit -o json | jq '.spec.containerRuntimeConfig'
{
  "logLevel": "debug",
  "logSizeMax": "0",
  "overlaySize": "0",
  "pidsLimit": 4096
}

When checking on the OpenShift Container Platform 4 - Node, using crio coonfig, we can see that the values are not applied. Yet it's disturbing to see those options added in the specification when in fact Red Hat is recommending to move them into KubeletConfig and remove them from ContainerRuntimeConfig.

Further, having them still set in ContainerRuntimeConfig will trigger a false/positive alert in Red Hat Insights as generally the customer may have followed the recommendation but the system does not comply with the changes made :-)

Also interesting , similar problem was reported a while ago in https://bugzilla.redhat.com/show_bug.cgi?id=1941936 and fixed. Hence it's interesting that this is coming back again.

Version-Release number of selected component (if applicable):

OpenShift Container Platform 4.13.4

How reproducible:

Always

Steps to Reproduce:

1. Install OpenShift Container Platform 4.13.4
2. Create ContainerRuntimeConfig as shown above and validate the actual object created
3. Run oc get containerruntimeconfig  pidlimit -o json | jq '.spec.containerRuntimeConfig' to validate the object created and inspect the spec.

Actual results:

$ oc get containerruntimeconfig  pidlimit -o json | jq '.spec.containerRuntimeConfig'
{
  "logLevel": "debug",
  "logSizeMax": "0",
  "overlaySize": "0",
  "pidsLimit": 4096
}

Expected results:

$ oc get containerruntimeconfig  pidlimit -o json | jq '.spec.containerRuntimeConfig'
{
  "logLevel": "debug",
  "pidsLimit": 4096
}

Additional info:

https://github.com/openshift/machine-config-operator/pull/4044

Bug OCPBUGS-16801: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oc-mirror/pull/774

Bug OCPBUGS-21638: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-21839: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-aws/pull/51

Bug OCPBUGS-19215: Update 4.15 ose-ibm-vpc-block-csi-driver-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ibm-vpc-block-csi-driver-operator/pull/78

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ibm-vpc-block-csi-driver-operator/pull/78

Bug OCPBUGS-23913: machine-api-controller stuck in CrashLoopBackOff

View the Description View the linked PRs

Description of problem: Panic on machine-controller

2023-11-23T18:18:47.899851056Z I1123 18:18:47.899752       1 controller.go:115]  "msg"="Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference" "controller"="machine-controller" "name"="bogus-6121tjfqk-cpr4v" "namespace"="openshift-machine-api" "object"={"name":"bogus-6121tjfqk-cpr4v","namespace":"openshift-machine-api"} "reconcileID"="38050b3e-3313-4500-8955-59f6822fd650"
2023-11-23T18:18:47.901976792Z panic: runtime error: invalid memory address or nil pointer dereference [recovered]
2023-11-23T18:18:47.901976792Z 	panic: runtime error: invalid memory address or nil pointer dereference
2023-11-23T18:18:47.901976792Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x27fcb31]
2023-11-23T18:18:47.902001202Z 
2023-11-23T18:18:47.902001202Z goroutine 261 [running]:
2023-11-23T18:18:47.902001202Z sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
2023-11-23T18:18:47.902001202Z 	/go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:116 +0x1fa
2023-11-23T18:18:47.902013625Z panic({0x2ab4640, 0x4373ed0})
2023-11-23T18:18:47.902022923Z 	/usr/lib/golang/src/runtime/panic.go:884 +0x213
2023-11-23T18:18:47.902043867Z github.com/openshift/machine-api-provider-openstack/pkg/machine.extractRootVolumeFromProviderSpec(...)
2023-11-23T18:18:47.902043867Z 	/go/src/sigs.k8s.io/cluster-api-provider-openstack/pkg/machine/convert.go:211
2023-11-23T18:18:47.902053364Z github.com/openshift/machine-api-provider-openstack/pkg/machine.(*OpenstackClient).Delete(0xc0000bfab0, {0x3113ff0?, 0xc000605ec0?}, 0xc00065fd40)
2023-11-23T18:18:47.902062370Z 	/go/src/sigs.k8s.io/cluster-api-provider-openstack/pkg/machine/actuator.go:335 +0x1b1
2023-11-23T18:18:47.902082577Z github.com/openshift/machine-api-operator/pkg/controller/machine.(*ReconcileMachine).Reconcile(0xc000304aa0, {0x3113ff0, 0xc000605ec0}, {{{0xc000d66a50?, 0x0?}, {0xc000d66a38?, 0xc00043cd48?}}})
2023-11-23T18:18:47.902117667Z 	/go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/github.com/openshift/machine-api-operator/pkg/controller/machine/controller.go:216 +0x1dee
2023-11-23T18:18:47.902139450Z sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x31181b8?, {0x3113ff0?, 0xc000605ec0?}, {{{0xc000d66a50?, 0xb?}, {0xc000d66a38?, 0x0?}}})
2023-11-23T18:18:47.902166210Z 	/go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119 +0xc8
2023-11-23T18:18:47.902186773Z sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0005488c0, {0x3113f48, 0xc000350550}, {0x2b9b6a0?, 0xc000475760?})
2023-11-23T18:18:47.902196557Z 	/go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316 +0x3ca
2023-11-23T18:18:47.902205655Z sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0005488c0, {0x3113f48, 0xc000350550})
2023-11-23T18:18:47.902214747Z 	/go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266 +0x1d9
2023-11-23T18:18:47.902223782Z sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
2023-11-23T18:18:47.902223782Z 	/go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227 +0x85
2023-11-23T18:18:47.902233237Z created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
2023-11-23T18:18:47.902242150Z 	/go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:223 +0x587

The bogus machine bogus-6121tjfqk-cpr4v was created by openstack-test "[sig-installer][Suite:openshift/openstack] Bugfix bz_2073398: [Serial] MachineSet scale-in does not leak OpenStack ports" which was run before and passed.

Version-Release number of selected component (if applicable):

Network_Type: OVNKubernetes
osp_puddle: ~~RHOS-17~~.1-RHEL-9-20231102.n.1
ocp_puddle: 4.15.0-0.nightly-2023-11-20-205649

How reproducible: Observed once.
Additional info: must-gather provided on private comment

https://github.com/openshift/machine-api-provider-openstack/pull/98

Bug OCPBUGS-24451: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-olm-operator/pull/36

Bug MGMT-15680: Infraenv controller should reconcile pull secret changes

View the Description View the linked PRs

Description of the problem:

A user with an invalid pull secret cannot correct the issue without deleting the infraenv

How reproducible:

100%

Steps to reproduce:

1. Create a malformed pull secret (like this one)

kind: Secret
apiVersion: v1
metadata:
  name: pullsecret
data:
  '.dockerconfigjson': eyJhdXRocyI6eyJub3RoaW5nLmNvbSI6eyJhdXRoIjoiWTJsaGJ3PT09PSIsImVtYWlsIjoiZmFrZUBjaWFvLmNvbSJ9fX0=
type: 'kubernetes.io/dockerconfigjson'

2. Create an infraenv referencing this secret as the pull secret

3. Correct the pull secret

Actual results:

Infraenv still has error message about a malformed pull secret

Expected results:

Infraenv uses the updated pull secret

https://github.com/openshift/assisted-service/pull/5589

Bug OCPBUGS-19144: Update 4.15 ose-powervs-machine-controllers image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-powervs/pull/51

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-powervs/pull/51

Bug OCPBUGS-24330: Update 4.15 ose-csi-snapshot-validation-webhook-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-snapshotter/pull/122

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-snapshotter/pull/124

Bug OCPBUGS-19191: Update 4.15 ose-baremetal-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/baremetal-operator/pull/302

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/baremetal-operator/pull/302

Bug OCPBUGS-23756: After PatternFly5 update: YAML editor view shortcuts text and icon is missaligned

View the Description View the linked PRs

Issue 19 from https://docs.google.com/spreadsheets/d/1TR3ENY-GE_LQL9F-xH6NahtRHdu6IrYGhLqDDA8y0EI/edit#gid=1035185624

Horizontal alignment is slightly off between text and icon

Screenshot: https://drive.google.com/file/d/1nzFHCeorlVIMbwlnjzEc1fCW0GXQa1KT/view

https://github.com/openshift/console/pull/13374

Bug OCPBUGS-19160: Update 4.15 ose-cluster-kube-apiserver-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-kube-apiserver-operator/pull/1550

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1550

Bug OCPBUGS-23923: Pipelinerun task logs switcher not working

View the Description View the linked PRs

Description of problem:

Pipelinerun task log switcher is stuck and is not loading the respective task logs when you switch from one task to another.

Version-Release number of selected component (if applicable):

4.15.0

How reproducible:

Always

Steps to Reproduce:

    1. Create a pipeline with multiple tasks.
    2. Start the pipeline and go to the logs page
    3. Switch between the tasks to see its logs.

Actual results:
Not able to click the task on the left hand side and the logs widow is showing blank screen.

Expected results:

Should be able to switch between the tasks and selected task logs should be shown in the log window

Attached Video:

https://drive.google.com/file/d/1pPQm9YYyWZxfCwFnudviSCyqoPHn8D9x/view?usp=sharing

https://github.com/openshift/console/pull/13369

Bug OCPBUGS-17289: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-operator/pull/25

Bug OCPBUGS-19100: Update 4.15 ose-csi-snapshot-controller image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-snapshotter/pull/104

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-snapshotter/pull/104

Story ETCD-187: Create dashboard that shows CPU iotwait on master nodes

View the Description View the linked PRs

This came out of the https://bugzilla.redhat.com/show_bug.cgi?id=1943704.

Add dashboard for iowait CPU on master nodes, this will help customers and customer support or us identify problems that result in leader election - we can see that often due to high iowait, aligning with large spikes in fsync and or peer to peer latency.

Query:

(sum(irate(node_cpu_seconds_total {mode="iowait"} [2m])) without (cpu)) / count(node_cpu_seconds_total) without (cpu) * 100
AND on (instance) label_replace( kube_node_role{role="master"}, "instance", "$1", "node", "(.+)" )

https://github.com/openshift/cluster-etcd-operator/pull/1119

Bug OCPBUGS-17866: GOOGLE_APPLICATION_CREDENTIALS is skipped for env vars

View the Description View the linked PRs

Description of problem:

According to

https://cloud.google.com/docs/authentication/provide-credentials-adc#local-key the default for application credentials is to set

GOOGLE_APPLICATION_CREDENTIALS. currently this var is missing from the list of environment variables checked.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/6863

Bug OCPBUGS-18162: [Multi-NIC]EgressIP was not correctly reassigned when label/unlabel egress node

View the Description View the linked PRs

Description of problem:

[Multi-NIC]EgressIP was not correctly reassigned when label/unlabel egress node

Version-Release number of selected component (if applicable):

Tested PR openshift/cluster-network-operator#1969,openshift/ovn-kubernetes#1832
together

How reproducible:

Steps to Reproduce:

1.  Label worker-0 node as egress node, and create one egressip object
# oc get egressip
NAME         EGRESSIPS      ASSIGNED NODE   ASSIGNED EGRESSIPS
egressip-1   172.22.0.100   worker-0        172.22.0.100

2. Create another egressIP object, the egressIP located on  worker-0  as well.
# oc get egressip
NAME         EGRESSIPS      ASSIGNED NODE   ASSIGNED EGRESSIPS
egressip-1   172.22.0.100   worker-0        172.22.0.100
egressip-2   172.22.0.101   worker-0        172.22.0.101

3. Checked secondary NIC on egress node, the two IPs were correctly added
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:da:86:9b:3e:ac brd ff:ff:ff:ff:ff:ff
    inet 172.22.0.86/24 brd 172.22.0.255 scope global dynamic noprefixroute enp1s0
       valid_lft 96sec preferred_lft 96sec
    inet 172.22.0.100/32 scope global enp1s0ovn
       valid_lft forever preferred_lft forever
    inet 172.22.0.101/32 scope global enp1s0ovn
       valid_lft forever preferred_lft forever
    inet6 fe80::2da:86ff:fe9b:3eac/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

4. Label another node worker-1 as egress node
5. Delete egressip-2 and recreated it, egressip-2 is on worker-1  
# oc get egressip
NAME         EGRESSIPS      ASSIGNED NODE   ASSIGNED EGRESSIPS
egressip-1   172.22.0.100   worker-0        172.22.0.100
egressip-2   172.22.0.101   worker-1        172.22.0.101
6. Unlabel egress from worker-1, 172.22.0.101 was reassigned to  worker-0 
# oc get egressip
NAME         EGRESSIPS      ASSIGNED NODE   ASSIGNED EGRESSIPS
egressip-1   172.22.0.100   worker-0        172.22.0.100
egressip-2   172.22.0.101   worker-0        172.22.0.101

7, Check the  worker-0's and  worker-1' secondary NIC
3.

Actual results:

EgressIP was not removed from worker-1
# oc debug node/worker-1
Starting pod/worker-1-debug-pw7xk ...
To use host binaries, run `chroot /host`
Pod IP: 192.168.111.24
If you don't see a command prompt, try pressing enter.
sh-4.4# ip a show enp1s0
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:da:86:9b:3e:b0 brd ff:ff:ff:ff:ff:ff
    inet 172.22.0.90/24 brd 172.22.0.255 scope global dynamic noprefixroute enp1s0
       valid_lft 115sec preferred_lft 115sec
    inet 172.22.0.101/32 scope global enp1s0ovn
       valid_lft forever preferred_lft forever
    inet6 fe80::2da:86ff:fe9b:3eb0/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

172.22.0.100 was missed from worker-0
# oc debug node/worker-0
Starting pod/worker-0-debug-8nz5f ...
To use host binaries, run `chroot /host`
Pod IP: 192.168.111.23
If you don't see a command prompt, try pressing enter.
sh-4.4# ip a show enp1s0
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:da:86:9b:3e:ac brd ff:ff:ff:ff:ff:ff
    inet 172.22.0.86/24 brd 172.22.0.255 scope global dynamic noprefixroute enp1s0
       valid_lft 68sec preferred_lft 68sec
    inet 172.22.0.101/32 scope global enp1s0ovn
       valid_lft forever preferred_lft forever
    inet6 fe80::2da:86ff:fe9b:3eac/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

Expected results:

The egressIP should be correctly reassigned to correct egress node

Additional info:

https://github.com/openshift/ovn-kubernetes/pull/1911

Bug OCPBUGS-24167: Update 4.15 ose-cluster-ingress-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-ingress-operator/pull/1002

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-ingress-operator/pull/1002

Bug OCPBUGS-23743: Bump router to Kubernetes 1.28 for 4.15

View the Description View the linked PRs

Description of problem

The openshift/router repository vendors k8s.io/* v0.27.2. OpenShift 4.15 is based on Kubernetes 1.28.

Version-Release number of selected component (if applicable)

4.15.

How reproducible

Always.

Steps to Reproduce

Check https://github.com/openshift/router/blob/release-4.15/go.mod.

Actual results

The k8s.io/* packages are at v0.27.2.

Expected results

The k8s.io/* packages are at v0.28.0 or newer.

https://github.com/openshift/router/pull/542

Bug OCPBUGS-18662: cnf-tests: [test_id: 55012] RPS configuration applied on some physical devices

View the Description View the linked PRs

Description of problem:
RPS configuration test failed with the following error:

[FAILED] Failure recorded during attempt 1:
a host device rps mask is different from the reserved CPUs; have "0" want ""
Expected
    <bool>: false
to be true
In [It] at: /tmp/cnf-ZdGbI/cnf-features-deploy/vendor/github.com/onsi/gomega/internal/assertion.go:62 @ 09/06/23 03:47:44.144
< Exit [It] [test_id:55012] Should have the correct RPS configuration - /tmp/cnf-ZdGbI/cnf-features-deploy/vendor/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/1_performance/performance.go:337 @ 09/06/23 03:47:44.144 (39.949s)

Full report:

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-telco5g-cnftests/1699249554244767744/artifacts/e2e-telco5g-cnftests/telco5g-cnf-tests/artifacts/test_results.html

How reproducible:

Very often

Steps to Reproduce:
1. Reproduce automatically by the cnf-tests nightly job

Actual results:
Some of the virtual devices are not configured with the correct RPS mask

Expected results:
All virtual network devices are expected to have the correct RPS mask

Bug OCPBUGS-19178: Update 4.15 baremetal-machine-controller image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-baremetal/pull/196

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-baremetal/pull/196

Bug OCPBUGS-22425: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ironic-image/pull/420

Bug OCPBUGS-18863: Update 4.15 ironic-rhcos-downloader image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ironic-rhcos-downloader/pull/93

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ironic-rhcos-downloader/pull/93

Bug OCPBUGS-19187: Update 4.15 ose-libvirt-machine-controllers image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-libvirt/pull/262

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-libvirt/pull/262

Bug OCPBUGS-15900: TestMTLSWithCRLs e2e test failures panic

View the Description View the linked PRs

Description of problem:

When the TestMTLSWithCRLs e2e test fails on a curl, it checks the stdout but the stdout could be empty, so it panics:

 --- FAIL: TestAll/parallel/TestMTLSWithCRLs (97.09s)
            --- FAIL: TestAll/parallel/TestMTLSWithCRLs/certificate-distributes-its-own-crl (97.09s)
panic: runtime error: slice bounds out of range [-3:] [recovered]
	panic: runtime error: slice bounds out of range [-3:]

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Experience a failure on the MTLS testing, such as seen in https://redhat-internal.slack.com/archives/CBWMXQJKD/p1688596054069399?thread_ts=1688596036.042119&cid=CBWMXQJKD

Search.ci shows two failures in the past two weeks: https://search.ci.openshift.org/?search=FAIL%3A+TestAll%2Fparallel%2FTestMTLSWithCRLs&maxAge=336h&context=1&type=bug%2Bissue%2Bjunit&name=cluster-ingress-operator&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Steps to Reproduce:

1. N/A
2.
3.

Actual results:

Test panics when trying to report an error.

Expected results:

Test reports whatever error it can without panics.

Additional info:

stdout was empty, but https://github.com/openshift/cluster-ingress-operator/blob/4c92a6d1ee80b6b120dd750855a40145a530153c/test/e2e/client_tls_test.go#L1587 doesn't check that the value is empty before it tries to index it.

https://github.com/openshift/cluster-ingress-operator/pull/973

Bug OCPBUGS-21719: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-authentication-operator/pull/635

Bug OCPBUGS-19073: [4.15 HCP] label missing for aws-ebs-csi-driver-operator in HCP Guest cluster

View the Description View the linked PRs

Description of problem:

label missing for aws-ebs-csi-driver-operator in HCP Guest cluster

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-09-12-195514

How reproducible:

Always

Steps to Reproduce:

1. Install Hypershift kind cluster from flexy template
   aos-4_14/ipi-on-aws/versioned-installer-ovn-hypershift-ci

oc get deployment/aws-ebs-csi-driver-operator -n clusters-hypershift-ci-3366 -o jsonpath='{.spec.template.metadata.labels}'
{"name":"aws-ebs-csi-driver-operator"}

Actual results:

{"name":"aws-ebs-csi-driver-operator"}

Expected results:

need-management-kas-access

Additional info:

oc get deployment/cluster-storage-operator -n clusters-hypershift-ci-3366 -o jsonpath='{.spec.template.metadata.labels}'
{"hypershift.openshift.io/hosted-control-plane":"clusters-hypershift-ci-3366","hypershift.openshift.io/need-management-kas-access":"true","name":"cluster-storage-operator"}

Discussion: https://redhat-internal.slack.com/archives/GK0DA0JR5/p1694782231463969

Bug OCPBUGS-20506: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-rukpak/pull/36

Task MON-3502: Update node_exporter to v1.7.0

View the linked PRs

https://github.com/openshift/node_exporter/pull/139

Bug OCPBUGS-19192: Update 4.15 ose-network-interface-bond-cni image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/bond-cni/pull/59

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/bond-cni/pull/59

Bug OCPBUGS-21922: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-etcd-operator/pull/1141

Bug OCPBUGS-24217: auth operator TLS artifacts should have ownership annotations

View the linked PRs

https://github.com/openshift/cluster-authentication-operator/pull/642

Bug OCPBUGS-22773: PowerVS: fix removeFromLoadBalancers

View the Description View the linked PRs

Description of problem:{code:none}

Deploying a cluster results in:

time="2023-10-30T19:10:59-04:00" level=debug msg="Apply complete! Resources: 0 added, 0 changed, 3 destroyed."
time="2023-10-30T19:10:59-04:00" level=fatal msg="error destroying bootstrap resources failed disabling bootstrap load balancing: %!w(<nil>)"

Version-Release number of selected component (if applicable):

4.15.0

How reproducible:

Occasionally

Steps to Reproduce:

1. Deploy a PowerVS cluster in a zone with PER

Actual results:

Expected results:


It should deploy correctly

Additional info:

https://github.com/openshift/installer/pull/7653

Bug OCPBUGS-24111: Update 4.15 ose-alibaba-cloud-controller-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-alibaba-cloud/pull/39

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-alibaba-cloud/pull/39

Bug OCPBUGS-23775: After PatternFly5 update: Form error is missing when import a container image

View the Description View the linked PRs

Issue 49 from https://docs.google.com/spreadsheets/d/1TR3ENY-GE_LQL9F-xH6NahtRHdu6IrYGhLqDDA8y0EI/edit#gid=1035185624

Form error is missing when import a container image while the import from Git form shows an error correctly.

Screenshot: https://drive.google.com/file/d/1aUfUefnF3IxVzNjn7D3Q05pK9z4prVtN/view?usp=drive_link

https://github.com/openshift/console/pull/13365

Bug OCPBUGS-20563: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-baremetal-operator/pull/374

Bug OCPBUGS-23555: OAuthClient 'openshift-cli-client' is missing for HyperShift Guest Clusters causing `oc login --web` fails

View the Description View the linked PRs

Description of problem:

The oc login --web command fails when used with a Hypershift Guest Cluster. The web console returns an error message stating that the client is unauthorized to request a token using this method.
Error Message:
{  "error": "unauthorized_client",  
"error_description": "The client is not authorized to request a token using this method."
}

OCP does not have such issue.

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-11-21-212406
4.14
4.15

How reproducible:

always

Steps to Reproduce:

1.Install a Hypershift Guest Cluster.
2. Configure the Any OpenID Identity Provider for the Hypershift Guest Cluster eg. https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-62511
3. Execute the oc login --web $URL command.

4. After adding openshift-cli-client manually it's works
# cat oauth.yaml
apiVersion: oauth.openshift.io/v1
grantMethod: auto
kind: OAuthClient
metadata:
  name: openshift-cli-client
redirectURIs:
- http://127.0.0.1/callback,http://[::1]/callback
respondWithChallenges: false

# oc create -f oauth.yaml
oauthclient.oauth.openshift.io/openshift-cli-client created

$ oc login --web $URL
Opening login URL in the default browser: https://oauth-clusters-hypershift-ci-28276.apps.xxxxxxxxxxxxxxxx.com:443/oauth/authorize?client_id=openshift-cli-client&code_challenge=mixnB73nR_yzL58e0lEd4soQH1sn0GjvWEfnX4PNrCg&code_challenge_method=S256&redirect_uri=http%3A%2F%2F127.0.0.1%3A45055%2Fcallback&response_type=code
Login successful.

Actual results:

Step 3: The web login process fails and redirects to an error page displaying the error message "error_description": "The client is not authorized to request a token using this method."

Expected results:

OAuthClient 'openshift-cli-client' should not be missing for HyperShift Guest Clusters so that the oc login --web $URL command should work without any issues. As OCP 4.13+ has the OAuthClient 'openshift-cli-client' by default.

Additional info:

The issue can be tracked at the following URL: https://issues.redhat.com/browse/AUTH-444

Root Cause :
Default 'openshift-cli-client' OAuthClient should not be missing for HyperShift Guest Clusters.

Bug OCPBUGS-24075: Update 4.15 ose-azure-file-csi-driver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/azure-file-csi-driver/pull/44

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/azure-file-csi-driver/pull/44

Bug OCPBUGS-19013: ovnkube-trace compatibility issue on RHEL8.6

View the Description View the linked PRs

Description of problem:

There is an regression issue for ovnkube-trace compatibility.

I tried on 4.13.6, the ovnkube-trace binary file can be used on RHEL8.6, only has issue for 'pip3 not available', same to https://issues.redhat.com/browse/OCPBUGS-15914 

But on 4.13.7, ovnkube-trace binary file cannot be used on RHEL8.6 any more, with below glibc error:
./ovnkube-trace: /lib64/libc.so.6: version `GLIBC_2.32' not found (required by ./ovnkube-trace) 
./ovnkube-trace: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by ./ovnkube-trace)

Version-Release number of selected component (if applicable):

4.13.7

How reproducible:

always

Steps to Reproduce:

1. install OCP4.13.7

2. copy ovnkube-trace binary file from ovnkube-master pod to local
$ POD=$(oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-master -o name | head -1 | awk -F '/' '{print $NF}')
$ oc cp -n openshift-ovn-kubernetes $POD:/usr/bin/ovnkube-trace ovnkube-trace
Defaulted container "northd" out of: northd, nbdb, kube-rbac-proxy, sbdb, ovnkube-master, ovn-dbchecker
tar: Removing leading `/' from member names
$ chmod +x ovnkube-trace  $ ls -l ovnkube-trace 
-rwxrwxr-x. 1 cloud-user cloud-user 45947136 Sep 14 03:10 ovnkube-trace

3. run ovnkube-trace help
$ ./ovnkube-trace -h

Actual results:

$ ./ovnkube-trace -h 
./ovnkube-trace: /lib64/libc.so.6: version `GLIBC_2.32' not found (required by ./ovnkube-trace) 
./ovnkube-trace: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by ./ovnkube-trace)

Expected results:

ovnkube-trace can be used on RHEL8.6

Additional info:

https://github.com/openshift/ovn-kubernetes/pull/1887

Bug OCPBUGS-19792: OVN-Kubernetes node webhook does not allow to set k8s.ovn.org/node-mgmt-port and k8s.ovn.org/gateway-mtu-support

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/ovn-kubernetes/pull/1911

Bug OCPBUGS-19214: Update 4.15 ose-oauth-apiserver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/oauth-apiserver/pull/90

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/oauth-apiserver/pull/90

Bug OCPBUGS-23778: After PatternFly5 update: Details page uses a bold font for the action dropdown

View the Description View the linked PRs

Issue 51 from https://docs.google.com/spreadsheets/d/1TR3ENY-GE_LQL9F-xH6NahtRHdu6IrYGhLqDDA8y0EI/edit#gid=1035185624

Detail page Action dropdown uses an bolder font now. That is not used on other actions buttons.

Investigation findings: PF5 sets button elements to font-family: inherit and since this button is inside an <h1> it gets RedHatDisplay instead of RedHatText font-family. A quick fix would be to add font-family: var(-~~pf-v5-globalFontFamily~~-text) to .co-actions

Screenshots:

https://github.com/openshift/console/pull/13375

Bug OCPBUGS-25406: 4.14-fast ARO after upgrade to 4.14 new Machinesets do not get worker config

View the Description View the linked PRs

Description of problem:

On a 4.14.5-fast channel cluster in ARO after the upgrade when the customer tried to add a new node the Machine Config was not applied and the node never joined the pool. This happens for every node and can only be remediated by SRE not the customer.

Version-Release number of selected component (if applicable):

4.14.5 -candidate

How reproducible:

Every time a node is added to the cluster at version.

Steps to Reproduce:

    1. Install an ARO cluster
    2. Upgrade it to 4.14 along fast channel
    3. Add a node

Actual results:

 message: >-
        could not Create/Update MachineConfig: Operation cannot be fulfilled on
        machineconfigs.machineconfiguration.openshift.io
        "99-worker-generated-kubelet": the object has been modified; please
        apply your changes to the latest version and try again
      status: 'False'
      type: Failure
    - lastTransitionTime: '2023-11-29T17:44:37Z'

~~~

Expected results:

Node is created and configured correctly.

Additional info:

 MissingStaticPodControllerDegraded: static pod lifecycle failure - static pod: "kube-apiserver" in namespace: "openshift-kube-apiserver" for revision: 15 on node: "aro-cluster-REDACTED-master-0" didn't show up, waited: 4m45s

https://github.com/openshift/machine-config-operator/pull/4087

Bug OCPBUGS-18352: winc upgrades are failing from 4.13 -> 4.14 due to remote ovnkube-controller is not ready

View the Description View the linked PRs

Description of problem:

From CLBO ovnkube-node logs:

Upgrade hack: Timed out waiting for the remote ovnkube-controller to be ready even after 5 minutes, err : context deadline exceeded, unable to fetch node-subnet annotation for node ip-10-0-133-201.us-east-2.compute.internal: err, could not find "k8s.ovn.org/node-subnets" annotation

ovnkube-controller not ready implies the absence of node-subnets annotation

CNO upgrade stuck at DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - last change 2023-08-30T10:06:44Z

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-11-055332

How reproducible:

Always

Steps to Reproduce:

1.Install OCP cluster with RHCOS and win nodes on 4.13
2.Perform upgrade to 4.14
3.

Actual results:

Upgrades failed on CNO

Expected results:

Upgrade should pass

Additional info:

must-gather: http://shell.lab.bos.redhat.com/~anusaxen/must-gather.local.1473221474492991466/

https://github.com/openshift/ovn-kubernetes/pull/1907

Bug OCPBUGS-18246: Azure AD Workload Identity does not work with bring your own vnet

View the Description View the linked PRs

Description of problem:

Role assignment for Azure AD Workload Identity performed by ccoctl does not provide an option to scope role assignments to a resource group containing customer vnet in a byo vnet installation workflow.

https://docs.openshift.com/container-platform/4.13/installing/installing_azure/installing-azure-vnet.html

Version-Release number of selected component (if applicable):

4.14.0

How reproducible:

100%

Steps to Reproduce:

1. Create Azure resource group and vnet for OpenShift within that resource group.
2. Create Azure AD Workload Identity infrastructure with ccoctl.
3. Follow steps to configure existing vnet for installation setting networkResourceGroupName within the install config.
4. Attempt cluster installation.

Actual results:

Cluster installation fails.

Expected results:

Cluster installation succeeds.

Additional info:

ccoctl must be extended to accept a parameter specifying the network resource group name and scope relevant component role assignments to the network resource group in addition to the installation resource group.

https://github.com/openshift/cloud-credential-operator/pull/597

Bug OCPBUGS-21874: Update 4.15 ose-agent-installer-utils-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/agent-installer-utils/pull/31

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/agent-installer-utils/pull/31

Bug OCPBUGS-15087: /sysroot mountpoint failed to resize automatically on new nodes during machineset scaleup

View the Description View the linked PRs

Description of problem:
New machines got stuck in Provisioned state when the customer tried to scale the machineset.
~~~
NAME PHASE TYPE REGION ZONE AGE
ocp4-ftf8t-worker-2-wn6lp Provisioned 44m
ocp4-ftf8t-worker-redhat-x78s5 Provisioned 44m
~~~

Upon checking the journalctl logs from these VMs, we noticed that it was failing with "no space left on the device" errors while pulling images.

To troubleshoot the issue further we had to break root password in order to login and check the issue further.

Once root password was broken, we logged in to the system and check journalctl logs for failure errors.
We could see "no space left of device" for image pulls. Checking df -h output we could see /dev/sda4 (/dev/mapper/coreos-luks-root-nocrypt) which is mounted on /sysroot was 100% full.
As image would fail to get pulled, the machine-config-daemon-firstboot.service will not get completed. This would not allow us to get the node to 4.12, nor be part of the cluster.
The rest of the errors were side effect of the "no space left on device" error.
We could see that the /dev/sda4 was correctly partitioned to 120Gib. We compared to the working system and partition scheme matched.
The filesystem was only of 2.8 Gib instead of 120 Gib.
We manually extended the filesystem for / (xfs_growfs /) after which / mount was resized to 120Gib.
The node got rebooted once this step was performed and system came up fine with 4.12 Red Hat Coreos.
We waited for a while for the node to come up with kubelet and crio running, approved the certs and now the node is part of the cluster.

Later while checking the logs for RCA, we observed below errors from the logs which might help in determining why the sysroot mountpoint was not resized.
~~~
$ grep ~~i growfs sos_commands/logs/journalctl_no-pager_~~-since_-3days
Jun 12 10:37:30 ocp4-ftf8t-worker-2-wn6lp systemd[1]: ignition-ostree-growfs.service: Failed to load configuration: No such file or directory <---
Jun 12 10:37:30 ocp4-ftf8t-worker-2-wn6lp systemd[1]: ignition-ostree-growfs.service: Collecting.
~~~

Version-Release number of selected component (if applicable):
OCP 4.12.18.
IPI installation on RHV.

How reproducible:
Not able to reproduce the issue.

Steps to Reproduce:

1.
2.
3.

Actual results:
The /sysroot mountpoint was not resized to the actual size of the /dev/sda4 partition which further prevented the machine-config-daemon-firstboot.service from completing and the node was stuck at RHCOS version 4.6.

Currently the customer has to manually resize the /sysroot mountpoint everytime he adds a new node in the cluster as a workaround.

Expected results:
The /sysroot mountpoint should be automatically resized as a part of ignition-ostree-growfs.sh script.

Additional info:
The customer has recently migrated from old storagedomain to a new one on RHV if that matters? However they performed successful machineset scaleup tests with the new storagedomain on OCP 4.11.33 (before upgrading OCP).
They started facing issue with all the machinesets (new/existing) only after they upgraded the OCP version to 4.12.18.

https://github.com/openshift/machine-config-operator/pull/3865

Bug OCPBUGS-18963: [metal3] The BMH is stuck in registering "failed to register host in ironic: Bad Gateway"

View the Description View the linked PRs

OCP 4.14.0-rc.0
advanced-cluster-management.v2.9.0-130
multicluster-engine.v2.4.0-154

After encountering https://issues.redhat.com/browse/OCPBUGS-18959

Attempted to forcefully delete the BMH by removing the finalizer.
Then deleted all the metal3 pods.

Attempted to re-create the bmh.

Result:
the bmh is stuck in

oc get bmh
NAME                                           STATE         CONSUMER   ONLINE   ERROR   AGE
hp-e910-01.kni-qe-65.lab.eng.rdu2.redhat.com   registering              true             15m

seeing this entry in the BMO log:

{"level":"info","ts":"2023-09-13T16:15:57Z","logger":"controllers.BareMetalHost","msg":"start","baremetalhost":{"name":"hp-e910-01.kni-qe-65.lab.eng.rdu2.redhat.com","namespace":"kni-qe-65"}}
{"level":"info","ts":"2023-09-13T16:15:57Z","logger":"controllers.BareMetalHost","msg":"hardwareData is ready to be deleted","baremetalhost":{"name":"hp-e910-01.kni-qe-65.lab.eng.rdu2.redhat.com","namespace":"kni-qe-65"}}
{"level":"info","ts":"2023-09-13T16:15:57Z","logger":"controllers.BareMetalHost","msg":"host ready to be powered off","baremetalhost":

{"name":"hp-e910-01.kni-qe-65.lab.eng.rdu2.redhat.com","namespace":"kni-qe-65"}

,"provisioningState":"powering off before delete"}

{"level":"info","ts":"2023-09-13T16:15:57Z","logger":"provisioner.ironic","msg":"ensuring host is powered off (mode: hard)","host":"kni-qe-65~hp-e910-01.kni-qe-65.lab.eng.rdu2.redhat.com"}

{"level":"error","ts":"2023-09-13T16:15:57Z","msg":"Reconciler error","controller":"baremetalhost","controllerGroup":"metal3.io","controllerKind":"BareMetalHost","BareMetalHost":

{"name":"hp-e910-01.kni-qe-65.lab.eng.rdu2.redhat.com","namespace":"kni-qe-65"}

,"namespace":"kni-qe-65","name":"hp-e910-01.kni-qe-65.lab.eng.rdu2.redhat.com","reconcileID":"167061cc-7ab4-4c4a-ae45-8c19dfc3ac22","error":"action \"powering off before delete\" failed: failed to power off before deleting node: Host not registered","errorVerbose":"Host not registered\nfailed to power off before deleting node\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareMetalHostReconciler).actionPowerOffBeforeDeleting\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:493\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).handlePoweringOffBeforeDelete\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:585\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).ReconcileState\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:202\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareMetalHostReconciler).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:225\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:118\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:314\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1598\naction \"powering off before delete\" failed\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareMetalHostReconciler).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:229\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:118\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:314\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1598","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226"}

https://github.com/openshift/ironic-image/pull/401

Bug OCPBUGS-19226: Update 4.15 ose-cluster-control-plane-machine-set-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/241

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/241

Bug OCPBUGS-19836: duplicated log at addOrUpdateSubnet

View the Description View the linked PRs

Description of problem:

There are some duplicated logs originating from calling addOrUpdateSubnet twice, this is missleading.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always

Steps to Reproduce:

1. Start it up
2. Check logs.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/ovn-kubernetes/pull/1911

Bug OCPBUGS-19137: Update 4.15 cluster-etcd-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-etcd-operator/pull/1115

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-etcd-operator/pull/1115

Bug OCPBUGS-19381: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3024

Bug OU-298: monitoring plugin docker does not load files

View the Description View the linked PRs

The dockerfile provided is not configured properly and does not load the files generated by a build command.

https://github.com/openshift/monitoring-plugin/pull/83

Bug OCPBUGS-21814: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/3988

Bug OCPBUGS-24303: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7893

Bug OCPBUGS-24338: Update 4.15 ose-csi-external-snapshotter-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-snapshotter/pull/123

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-snapshotter/pull/123

Bug OCPBUGS-17293: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-ibmcloud/pull/25

Bug OCPBUGS-23071: inpect collections of resources is reported as an error (not a warning)

View the Description View the linked PRs

As part of this slack thread.

Description of problem:

When SRE collects data using `oc adm inspect`; the collection reports an error on 'secrets' (see below). This is because of the way SRE manages our hosted platforms, and the SRE users (service accounts) are not 'true admins' and must impersonate admins to preform operations.

$ oc adm inspect --dest-dir=must-gather ns/openshift-sdn

Gathering data for ns/openshift-sdn...
...
Wrote inspect data to must-gather.
error: errors occurred while gathering data:
    secrets is forbidden: User "system:serviceaccount:openshift-backplane-srep:f2b5cf795ef1fc5289490411d49ab042" cannot list resource "secrets" in API group "" in the namespace "openshift-sdn"

At the end of the day; the 'error' here is 'erroneous' (not a true error) but more of a warning, telling user that a specific object wasn't collected.

https://github.com/openshift/oc/pull/1601

Bug OCPBUGS-24154: Update 4.15 ose-cluster-machine-approver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-machine-approver/pull/217

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-machine-approver/pull/217

Story TRT-1354: Implement solution for structured intervals with a row differentiator in origin

View the Description View the linked PRs

In spyglass charts rows sometimes require an additional field added to the locator to make things appear on separate lines. (node state is a great example where we need os update, phases, and notready, all on separate lines, otherwise they would overlap and we wouldn't be able to see anything). This will also be useful for pod logs and similar.

Our goal is origin being able to add new intervals, without requiring an update to the js (which will be in sippy) to get things to display properly. We need a way to differentiate structured intervals into separate rows within the same group.

Leaning towards row/foo in the locator, as this value for each row is the locator.

https://github.com/openshift/origin/pull/28376

Bug OCPBUGS-18517: Fail to install with Kuryr due to issue when validating certificate for the API

View the Description View the linked PRs

Description of problem:

Installation with Kuryr is failing because multiple components are attempting to connect to the API and fail with the following error:

failed checking apiserver connectivity: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-service-ca/leases/service-ca-controller-lock": tls: failed to verify certificate: x509: cannot validate certificate for 172.30.0.1 because it doesn't contain any IP SANs

$ oc get po -A -o wide |grep -v Running |grep -v Pending |grep -v Completed
NAMESPACE                                          NAME                                                        READY   STATUS             RESTARTS          AGE     IP              NODE                   NOMINATED NODE   READINESS GATES
openshift-apiserver-operator                       openshift-apiserver-operator-559d855c56-c2rdr               0/1     CrashLoopBackOff   42 (2m28s ago)    3h44m   10.128.16.86    kuryr-5sxhw-master-2   <none>           <none>
openshift-apiserver                                apiserver-6b9f5d48c4-bj6s6                                  0/2     CrashLoopBackOff   92 (4m25s ago)    3h36m   10.128.70.10    kuryr-5sxhw-master-2   <none>           <none>
openshift-cluster-csi-drivers                      manila-csi-driver-operator-75b64d8797-fckf5                 0/1     CrashLoopBackOff   42 (119s ago)     3h41m   10.128.56.21    kuryr-5sxhw-master-0   <none>           <none>
openshift-cluster-csi-drivers                      openstack-cinder-csi-driver-operator-84dfd8d89f-kgtr8       0/1     CrashLoopBackOff   42 (82s ago)      3h41m   10.128.56.9     kuryr-5sxhw-master-0   <none>           <none>
openshift-cluster-node-tuning-operator             cluster-node-tuning-operator-7fbb66545c-kh6th               0/1     CrashLoopBackOff   46 (3m5s ago)     3h44m   10.128.6.40     kuryr-5sxhw-master-2   <none>           <none>
openshift-cluster-storage-operator                 cluster-storage-operator-5545dfcf6d-n497j                   0/1     CrashLoopBackOff   42 (2m23s ago)    3h44m   10.128.21.175   kuryr-5sxhw-master-2   <none>           <none>
openshift-cluster-storage-operator                 csi-snapshot-controller-ddb9469f9-bc4bb                     0/1     CrashLoopBackOff   45 (2m17s ago)    3h41m   10.128.20.106   kuryr-5sxhw-master-1   <none>           <none>
openshift-cluster-storage-operator                 csi-snapshot-controller-operator-6d7b66dbdd-xdwcs           0/1     CrashLoopBackOff   42 (92s ago)      3h44m   10.128.21.220   kuryr-5sxhw-master-2   <none>           <none>
openshift-config-operator                          openshift-config-operator-c5d5d964-2w2bv                    0/1     CrashLoopBackOff   80 (3m39s ago)    3h44m   10.128.43.39    kuryr-5sxhw-master-2   <none>           <none>
openshift-controller-manager-operator              openshift-controller-manager-operator-754d748cf7-rzq6f      0/1     CrashLoopBackOff   42 (3m6s ago)     3h44m   10.128.25.166   kuryr-5sxhw-master-2   <none>           <none>
openshift-etcd-operator                            etcd-operator-76ddc94887-zqkn7                              0/1     CrashLoopBackOff   49 (30s ago)      3h44m   10.128.32.146   kuryr-5sxhw-master-2   <none>           <none>
openshift-ingress-operator                         ingress-operator-9f76cf75b-cjx9t                            1/2     CrashLoopBackOff   39 (3m24s ago)    3h44m   10.128.9.108    kuryr-5sxhw-master-2   <none>           <none>
openshift-insights                                 insights-operator-776cd7cfb4-8gzz7                          0/1     CrashLoopBackOff   46 (4m21s ago)    3h44m   10.128.15.102   kuryr-5sxhw-master-2   <none>           <none>
openshift-kube-apiserver-operator                  kube-apiserver-operator-64f4db777f-7n9jv                    0/1     CrashLoopBackOff   42 (113s ago)     3h44m   10.128.18.199   kuryr-5sxhw-master-2   <none>           <none>
openshift-kube-apiserver                           installer-5-kuryr-5sxhw-master-1                            0/1     Error              0                 3h35m   10.128.68.176   kuryr-5sxhw-master-1   <none>           <none>
openshift-kube-controller-manager-operator         kube-controller-manager-operator-746497b-dfbh5              0/1     CrashLoopBackOff   42 (2m23s ago)    3h44m   10.128.13.162   kuryr-5sxhw-master-2   <none>           <none>
openshift-kube-controller-manager                  installer-4-kuryr-5sxhw-master-0                            0/1     Error              0                 3h35m   10.128.65.186   kuryr-5sxhw-master-0   <none>           <none>
openshift-kube-scheduler-operator                  openshift-kube-scheduler-operator-695fb4449f-j9wqx          0/1     CrashLoopBackOff   42 (63s ago)      3h44m   10.128.44.194   kuryr-5sxhw-master-2   <none>           <none>
openshift-kube-scheduler                           installer-5-kuryr-5sxhw-master-0                            0/1     Error              0                 3h35m   10.128.60.44    kuryr-5sxhw-master-0   <none>           <none>
openshift-kube-storage-version-migrator-operator   kube-storage-version-migrator-operator-6c5cd46578-qpk5z     0/1     CrashLoopBackOff   42 (2m18s ago)    3h44m   10.128.4.120    kuryr-5sxhw-master-2   <none>           <none>
openshift-machine-api                              cluster-autoscaler-operator-7b667675db-tmlcb                1/2     CrashLoopBackOff   46 (2m53s ago)    3h45m   10.128.28.146   kuryr-5sxhw-master-2   <none>           <none>
openshift-machine-api                              machine-api-controllers-fdb99649c-ldb7t                     3/7     CrashLoopBackOff   184 (2m55s ago)   3h40m   10.128.29.90    kuryr-5sxhw-master-0   <none>           <none>
openshift-route-controller-manager                 route-controller-manager-d8f458684-7dgjm                    0/1     CrashLoopBackOff   43 (100s ago)     3h36m   10.128.55.11    kuryr-5sxhw-master-2   <none>           <none>
openshift-service-ca-operator                      service-ca-operator-654f68c77f-g4w55                        0/1     CrashLoopBackOff   42 (2m2s ago)     3h45m   10.128.22.30    kuryr-5sxhw-master-2   <none>           <none>
openshift-service-ca                               service-ca-5f584b7d75-mxllm                                 0/1     CrashLoopBackOff   42 (45s ago)      3h42m   10.128.49.250   kuryr-5sxhw-master-0   <none>           <none>

$ oc get svc -A |grep  172.30.0.1 
default                                            kubernetes                                       ClusterIP   172.30.0.1       <none>        443/TCP                           3h50m

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-network-operator/pull/1988

Bug OCPBUGS-19950: Clusters stopped provisioning via ztp/acm in large scale test environment with mixed clusters types (SNO, Compact, Standard sized clusters) with bad gateway error

View the Description View the linked PRs

Description of problem:

While attempting to provision 300 clusters every hour of mixed cluster sizes (SNO, Compact, and standard cluster sizes) It appears that the metal3 baremetal operator has his a failure to provision any clusters.  Out of the 1850 attempted clusters, only 282 successfully provisioned (Mostly SNO size).

There seems to be many errors in the baremetal operator log, some of which are actual stack traces but it is unclear if this is the actually reason why the clusters began to fail to install with 100% not installing on the 3rd wave and beyond.

Version-Release number of selected component (if applicable):

Hub OCP - 4.14.0-rc.2
Deployed Cluster OCP - 4.14.0-rc.2
ACM - 2.9.0-DOWNSTREAM-2023-09-27-22-12-46

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Some of the errors found in the logs:
{"level":"error","ts":"2023-09-28T22:39:56Z","msg":"Reconciler error","controller":"baremetalhost","controllerGroup":"metal3.io","controllerKind":"BareMetalHost","BareMetalHost":{"name":"vm01343","namespace":"compact-00046"},"namespace":"compact-00046","name":"vm01343","reconcileID":"4bbfa52f-12a6-4983-b86b-01086491de9f","error":"action \"provisioning\" failed: failed to provision: failed to change provisioning state to \"active\": Internal Server Error","errorVerbose":"Internal Server Error\nfailed to change provisioning state to \"active\"\ngithub.com/metal3-io/baremetal-operator/pkg/provisioner/ironic.(*ironicProvisioner).tryChangeNodeProvisionState\n\t/go/src/github.com/metal3-io/baremetal-operator/pkg/provisioner/ironic/ironic.go:740\ngithub.com/metal3-io/baremetal-operator/pkg/provisioner/ironic.(*ironicProvisioner).changeNodeProvisionState\n\t/go/src/github.com/metal3-io/baremetal-operator/pkg/provisioner/ironic/ironic.go:750\ngithub.com/metal3-io/baremetal-operator/pkg/provisioner/ironic.(*ironicProvisioner).Provision\n\t/go/src/github.com/metal3-io/baremetal-operator/pkg/provisioner/ironic/ironic.go:1604\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareMetalHostReconciler).actionProvisioning\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:1179\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).handleProvisioning\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:527\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).ReconcileState\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:202\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareMetalHostReconciler).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:225\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:118\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:314\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1598\nfailed to provision\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareMetalHostReconciler).actionProvisioning\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:1188\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).handleProvisioning\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:527\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).ReconcileState\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:202\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareMetalHostReconciler).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:225\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:118\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:314\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1598\naction \"provisioning\" failed\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareMetalHostReconciler).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:229\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:118\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:314\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1598","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226"}


{"level":"info","ts":"2023-09-29T16:11:24Z","logger":"provisioner.ironic","msg":"error caught while checking endpoint","host":"standard-00241~vm03618","endpoint":"https://metal3-state.openshift-machine-api.svc.cluster.local:6388/v1/","error":"Bad Gateway"}

https://github.com/openshift/ironic-agent-image/pull/101

Bug OCPBUGS-21720: Use centos stream to build libvirt images

View the Description View the linked PRs

Description of problem:

Use centos stream to build libvirt images

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/6813

Bug OCPBUGS-24146: Update 4.15 ose-vertical-pod-autoscaler-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/vertical-pod-autoscaler-operator/pull/149

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kubernetes-autoscaler/pull/269

Bug OCPBUGS-18771: Keepalived pods crashes and fail to start on worker node (Ingress VIP)

View the Description View the linked PRs

Description of problem:

Customer reported that keepalived pods crashes and fail to start on worker node (Ingress VIP). The expectation is that the keepalived pod (labeled by app=kni-infra-vrrp) should start. This affects everyone using OCP v4.13 together with Ingress VIP and could be a potential bug in the nodeip-configuration service in v4.13.

More details as below:

-> There are 2 problems in OCP v4.13. The regexp expression won't match and the chroot command will fail because of missing ldd libraries inside the container. This has been fixed on 4.14, but not on 4.13.

-> The nodeip-configuration service creates the /run/nodeip-configuration/remote-worker file based on onPremPlatformAPIServerInternalIPs (apiVIP) and ignores the onPremPlatformIngressIPs (ingressVIP) as can be seen in source code.

-> Then the keepalived process wont start because the remote-worker file exists.

-> The liveness probes will fail because the keepalived process does not exist.

The fix is quite simple(as highlighted by the customer), The nodeip-configuration.service template needs to be to extended to consider the Ingress VIPs as well. This is the source code where changes need to be done

As per the following code snippet, The NODE-IP ranges only over the onPremPlatformAPIServerInternalIPs and ignores the onPremPlatformIngressIPs.

node-ip \
    set \
    --platform {{ .Infra.Status.PlatformStatus.Type }} \
    {{if not (isOpenShiftManagedDefaultLB .) -}}
    --user-managed-lb \
    {{end -}}
    {{if or (eq .IPFamilies "IPv6") (eq .IPFamilies "DualStackIPv6Primary") -}}
    --prefer-ipv6 \
    {{end -}}
    --retry-on-failure \
    {{ range onPremPlatformAPIServerInternalIPs . }}{{.}} {{end}}; \
    do \
    sleep 5; \
    done"

Difference between OCPv 4.12 and v4.13 related to keepalived pod is also indicated in this image attached

Version-Release number of selected component (if applicable):

v4.13

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

The keepalived pods crashes and fail to start on worker node (Ingress VIP)

Expected results:

The expectation is that the keepalived pod (labeled by app=kni-infra-vrrp) should start.

Additional info:

https://github.com/openshift/machine-config-operator/pull/3943

Bug OCPBUGS-18800: Fix MCO Image Registry ConfigMap updating

View the Description View the linked PRs

Description of problem:

currently the mco updates its image registry certificate configmap by deleting and re-creating it on each MCO sync. Instead, we should be patching it

Version-Release number of selected component (if applicable):

4.14

How reproducible:

always

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/3851

Bug OCPBUGS-19196: Update 4.15 operator-registry image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/operator-framework-olm/pull/563

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/operator-framework-olm/pull/563

Bug OCPBUGS-23082: Set automountServiceAccountToken to false for network-node-identity deployment in Hypershift

View the Description View the linked PRs

Description of problem:

From our initial investigation, it seems like the network-node-identity component does not need management cluster access in Hypershift

We were looking at:
https://github.com/openshift/cluster-network-operator/blob/release-4.14/bindata/network/node-identity/managed/node-identity.yaml

For the webhook and approver container: https://github.com/openshift/ovn-kubernetes/blob/release-4.14/go-controller/cmd/ovnkube-identity/ovnkubeidentity.go

For the token minter container: https://github.com/openshift/hypershift/blob/release-4.14/token-minter/tokenminter.go

We also tested by disabling the automountserviceaccounttoken and things still seemed to be functioning

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Deploy a 4.14 hosted cluster
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2100

Bug OCPBUGS-19156: Update 4.15 ose-machine-api-provider-openstack image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-openstack/pull/84

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-openstack/pull/84

Task SPLAT-1280: [vsphere] update control plane machinset documentation in repo

View the linked PRs

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/267

Task MON-3530: Update Owners file in Openshift-State-Metric repository

View the Description View the linked PRs

Update the owners file in openshift-state-metric repository, add new team mates in, move old team mates out.

https://github.com/openshift/openshift-state-metrics/pull/110

Bug OCPBUGS-19143: Update 4.15 kube-rbac-proxy image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kube-rbac-proxy/pull/72

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug MGMT-15984: Assisted installer doesn't freeze and unmount file systems used for overwriting os image

View the Description View the linked PRs

Description of the problem:

Assisted installer doesn't freeze and unmount file systems used for overwriting os image.
This causes the file system to become corrupt.

How reproducible:

Always for ZTP flow.

Steps to reproduce:

1. Run ZTP with enable-skip-mco-reboot set to true

Actual results:

Installation fails. Host drops to emergency shell.

Expected results:

Successful installation.

https://github.com/openshift/assisted-installer/pull/737

Bug OCPBUGS-19110: Update 4.15 azure-file-csi-driver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/azure-file-csi-driver/pull/34

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/azure-file-csi-driver/pull/34

Bug OCPBUGS-19239: Update 4.15 openshift-enterprise-egress-router image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/images/pull/151

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/images/pull/151

Bug OCPBUGS-22457: platform-operators-aggregated ClusterOperator manifest should not declare a namespace

View the Description View the linked PRs

Description of problem:

Cluster-scoped resources do not need (or want) metadata.namespace defined. Currently the platform-operators-aggregated ClusterOpreator manifest requests a namespace, but that request should be dropped to avoid confusing human and robot readers.

Version-Release number of selected component (if applicable):

At least 4.15. I haven't dug back to count previous 4.y.

How reproducible:

100%

Steps to Reproduce:

$ oc adm release extract --to manifests quay.io/openshift-release-dev/ocp-release:4.15.0-ec.1-x86_64
grep -r5 platform-operators-aggregated manifests/ | grep namespace:

Actual results:

manifests/0000_50_cluster-platform-operator-manager_07-aggregated-clusteroperator.yaml-  namespace: openshift-platform-operators

Expected results:

No hits.

https://github.com/openshift/platform-operators/pull/100

Bug OCPBUGS-21826: Add warning if managmentState is not managed for csi operator

View the Description View the linked PRs

We should warn loudly in logs when customers change managmentState of a CSI operator rather than logging with lower level log messages.

I spend non-trivial amount of time debugging a cluster where CSI driver won't get installed, only to find out that customer has somehow set managmentState to Removed.

https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/178

Bug OCPBUGS-23939: Missing enabled_firmware_interfaces config

View the Description View the linked PRs

Description of problem:

    Ironic image downstream lacks the configuration option added upstream

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/ironic-image/pull/428

Bug OCPBUGS-24162: Update 4.15 ose-cluster-kube-controller-manager-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-kube-controller-manager-operator/pull/772

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/772

Bug OCPBUGS-24853: Update 4.16 ose-installer-artifacts-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/installer/pull/7818

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/installer/pull/7818

Bug OCPBUGS-25231: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-node-driver-registrar/pull/60

Bug OCPBUGS-19356: Expose and propagate TopologySpreadConstraints for admission webhook

View the Description View the linked PRs

Backport facilitator for linked issue.

https://github.com/openshift/cluster-monitoring-operator/pull/2073

Bug OCPBUGS-24135: Update 4.15 ose-aws-cloud-controller-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-aws/pull/58

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-aws/pull/58

Bug OCPBUGS-21612: `oc adm ocp-certificates monitor-certificates` can panic

View the Description View the linked PRs

Description of problem:

Run the command `oc adm ocp-certificates monitor-certificates` will panic.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

100%

Steps to Reproduce:

1. `oc adm ocp-certificates monitor-certificates`

Actual results:

panic:

Expected results:

no panic

Additional info:

https://github.com/openshift/oc/pull/1589

Bug OCPBUGS-22217: Upstream OLM flaky-e2e-tests suite failing

View the Description View the linked PRs

Description of problem:

The flaky-e2e-test suite has been failing consistently due to some changes made to how the test environments are set up in each test. Two tests in particular have been failing and need to be fixed:
[FLAKE] should clear up the condition in the InstallPlan status that contains an error message when a valid OperatorGroup is created"
[FLAKE] consistent generation

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1. Run flaky-e2e-test suite

Actual results:

Tests never pass

Expected results:

Tests pass at least a majority of the time

Additional info:

https://github.com/openshift/operator-framework-olm/pull/595

Bug OCPBUGS-15599: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/image-customization-controller/pull/109

Bug OCPBUGS-18859: Update 4.15 ironic image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ironic-image/pull/397

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ironic-image/pull/397

Bug OCPBUGS-19517: auto-generated documentation for microshift includes unsupported commands

View the Description View the linked PRs

Description of problem:

~~OSDOCS-7408~~ lists some commands to be removed from the documentation for MicroShift because they are not supported.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/oc/pull/1548

Bug OCPBUGS-19118: Update 4.15 ose-cluster-platform-operators-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/platform-operators/pull/91

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/platform-operators/pull/91

Bug OCPBUGS-25395: [4.15] namespace port group is cleaned up on restart

View the Description View the linked PRs

Description of problem:

The problem was that namespace handler on initial sync would delete all ports (because logical port cache where it got lsp UUIDs wasn't populated) and all acls (they were just set to nil). Even though both ports and acls will be re-added by the corresponding handlers, it may cause disruption.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. create a namespace with at least 1 pod and egress firewall in it

2. pick any ovnkube-node pod, find namespace port group UUID in nbdb by external_ids["name"]=<namespace name>, e.g. for "test" namespace

_uuid               : 6142932d-4084-4bc3-bdcb-1990fc71891b
acls                : [ab2be619-1266-41c2-bb1d-1052cb4e1e97, b90a4b4a-ceee-41ee-a801-08c37a9bf3e7, d314fa8d-7b5a-40a5-b3d4-31091d7b9eae]
external_ids        : {name=test}
name                : a18007334074686647077
ports               : [55b700e4-8176-42e7-97a6-8b32a82fefe5, cb71739c-ad6c-4436-8fd6-0643a5417c7d, d8644bf1-6bed-4db7-abf8-7aaab0625324]

3. restart chosen ovn-k pod

4. check logs on restart that update chosen port group to have zero ports and zero acls

Update operations generated as: [{Op:update Table:Port_Group Row:map[acls:{GoSet:[]} external_ids:{GoMap:map[name:test]} ports:{GoSet:[]}] Rows:[] Columns:[] Mutations:[] Timeout:<nil> Where:[where column _uuid == {6142932d-4084-4bc3-bdcb-1990fc71891b}] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUID: UUIDName:}]

Actual results:

Expected results:

On restart port group stays the same, no extra update with empty ports and acls is generated

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

Is it an

internal CI failure
customer issue / SD
internal RedHat testing failure

If it is an internal RedHat testing failure:

Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

If it is a CI failure:

Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
If it's a connectivity issue,
What is the srcNode, srcIP and srcNamespace and srcPodName?
What is the dstNode, dstIP and dstNamespace and dstPodName?
What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

If it is a customer / SD issue:

Provide enough information in the bug description that Engineering doesn’t need to read the entire case history.
Don’t presume that Engineering has access to Salesforce.
Please provide must-gather and sos-report with an exact link to the comment in the support case with the attachment. The format should be: https://access.redhat.com/support/cases/#/case/<case number>/discussion?attachmentId=<attachment id>
Describe what each attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
- If the issue is in a customer namespace then provide a namespace inspect.
- If it is a connectivity issue:
  - What is the srcNode, srcNamespace, srcPodName and srcPodIP?
  - What is the dstNode, dstNamespace, dstPodName and dstPodIP?
  - What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
  - Please provide the UTC timestamp networking outage window from must-gather
  - Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
- If it is not a connectivity issue:
  - Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.

For OCPBUGS in which the issue has been identified, label with “sbr-triaged”
For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, labels with “sbr-untriaged”
Note: bugs that do not meet these minimum standards will be closed with label “SDN-Jira-template”

https://github.com/openshift/ovn-kubernetes/pull/1998

Bug OCPBUGS-26410: Explore making that remote write failure less intrusive

View the Description View the linked PRs

This is a clone of issue OCPBUGS-22399. The following is the description of the original issue:
—

Users are encountering an issue when attempting to "Create hostedcluster on BM+disconnected+ipv6 through MCE." This issue is related to the default settings of `--enable-uwm-telemetry-remote-write` being true. Which might mean that that in the default case with disconnected and whatever is configured in the configmap for UWM e.g (
  minBackoff: 1s
url: https://infogw.api.openshift.com/metrics/v1/receive
Is not reachable with disconneced.

So we should look into reporting the issue and remdiating vs. Fataling on it for disconnected scenarios.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

In MCE 2.4, we currently document to disable `--enable-uwm-telemetry-remote-write` if the hosted control plane feature is used in a disconnected environment.

https://github.com/stolostron/rhacm-docs/blob/lahinson-acm-7739-disconnected-bare-[…]s/hosted_control_planes/monitor_user_workload_disconnected.adoc

Once this Jira is fixed, the documentation needs to be removed, users do not need to disable `--enable-uwm-telemetry-remote-write`. The HO is expected to fail gracefully on `--enable-uwm-telemetry-remote-write` and continue to be operational.

Task MON-3376: Remove deprecated --logtostderr argument of kube-rbac-proxy

View the Description View the linked PRs

The argument has been deprecated in the v0.14.0 release:

https://github.com/brancz/kube-rbac-proxy/releases/tag/v0.14.0

https://github.com/openshift/cluster-monitoring-operator/pull/2077

Bug OCPBUGS-19292: Update 4.15 ose-network-tools image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/network-tools/pull/87

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/network-tools/pull/87

Bug OCPBUGS-19370: lack of hypershift labels for hcp components ovn,cloud-network-config,multus-admission controllers

View the Description View the linked PRs

Description of problem:

For hcp resources:
  "cloud-network-config-controller"
  "multus-admission-controller"
  "ovnkube-control-plane"

no `hypershift.openshift.io/hosted-control-plane:{hostedcluster resource namespace}-{cluster-name}` found in the above hcp resources

Version-Release number of selected component (if applicable):

4.14

How reproducible:

100%

Steps to Reproduce:

1. create a hosted cluster 
2. check the labels of those resources
e.g. `$ oc get pod multus-admission-controller-7c677c745c-l4dbc  -oyaml` to check the labels of it.

Or refer testcase: ocp-44988

Actual results:

no expected label found

Expected results:

the pods have the label:
`hypershift.openshift.io/hosted-control-plane:{hostedcluster resource namespace}-{cluster-name}`

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2048

Bug OCPBUGS-20342: Flaky debug pod return code

View the Description View the linked PRs

Description of problem:

As a part of the forbidden node label e2e test, we execute `oc debug` command to set the forbidden labels on the node. The `oc debug` command is expected to fail while applying the forbidden label.

In our testing, we observed that even though the actual command on the node (kubectl label node/<node> <forbidden_label>) expectedly fails, the `oc debug` command does not carry the return code correctly (it will return 0, even though `kubectl label` fails with error).

Version-Release number of selected component (if applicable):

4.14

How reproducible:

flaky

Steps to Reproduce:

1. Run the test at https://gist.github.com/harche/c9143c382cfe94d7836414d5ccc0ba45
2. Observe that sometimes it flakes at https://gist.github.com/harche/c9143c382cfe94d7836414d5ccc0ba45#file-test-go-L39

Actual results:

oc debug return value flakes

Expected results:

oc debug return value should be consistent.

Additional info:

https://github.com/openshift/oc/pull/1571

Bug OCPBUGS-21763: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-powervs/pull/44

Bug OCPBUGS-23308: vSphere ExcludeNetworkSubnetCIDR does not include fd69::2/128 for IPv6-only setups

View the Description View the linked PRs

As part of ~~OCPBUGS-18641~~ we have created a code that appends internal OVN-K8s subnet `fd69::2/128` to the `ExcludeNetworkSubnetCIDR` list for dual-stack installations.

What has been discovered now is that for IPv6-only clusters this network is not present on this list even though it should be.

This is causing vSphere IPv6-only setups to work incorrectly.

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/299

Bug OCPBUGS-21792: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-gcp/pull/62

Bug OCPBUGS-18892: ovn-ipsec pods CLBO when IPSec NS extension/svc is enabled

View the Description View the linked PRs

Description of problem:

ovn-ipsec pods Crashes when IPSec NS extension/svc is enabled on any $ROLE nodes

IPSec ext and svc were enabled for 2 WORKERS only and their corresponding ovn-ipsec pods are in CLBO


[root@dell-per740-36 ipsec]# oc get pods 
NAME                                       READY   STATUS             RESTARTS         AGE
dell-per740-14rhtsengpek2redhatcom-debug   1/1     Running            0                3m37s
ovn-ipsec-bptr6                            0/1     CrashLoopBackOff   26 (3m58s ago)   130m
ovn-ipsec-bv88z                            1/1     Running            0                3h5m
ovn-ipsec-pre414-6pb25                     1/1     Running            0                3h5m
ovn-ipsec-pre414-b6vzh                     1/1     Running            0                3h5m
ovn-ipsec-pre414-jzwcm                     1/1     Running            0                3h5m
ovn-ipsec-pre414-vgwqx                     1/1     Running            3                132m
ovn-ipsec-pre414-xl4hb                     1/1     Running            3                130m
ovn-ipsec-qb2bj                            1/1     Running            0                3h5m
ovn-ipsec-r4dfw                            1/1     Running            0                3h5m
ovn-ipsec-xhdpw                            0/1     CrashLoopBackOff   28 (116s ago)    132m
ovnkube-control-plane-698c9845b8-4v58f     2/2     Running            0                3h5m
ovnkube-control-plane-698c9845b8-nlgs8     2/2     Running            0                3h5m
ovnkube-control-plane-698c9845b8-wfkd4     2/2     Running            0                3h5m
ovnkube-node-l6sr5                         8/8     Running            27 (66m ago)     130m
ovnkube-node-mj8bs                         8/8     Running            27 (75m ago)     132m
ovnkube-node-p24x8                         8/8     Running            0                178m
ovnkube-node-rlpbh                         8/8     Running            0                178m
ovnkube-node-wdxbg                         8/8     Running            0                178m
[root@dell-per740-36 ipsec]#

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-09-12-024050

How reproducible:

Always

Steps to Reproduce:

1.Install OVN IPSec cluster (East-West) 
2.Enable IPSec OS extension for North-South
3.Enable IPSec service for North-South

Actual results:

ovn-ipsec pods in CLBO state

Expected results:

All pods under ovn-kubernetes ns should be Running fine

Additional info:

One of the ovn-ipsec CLBO pods logs

# oc logs ovn-ipsec-bptr6
Defaulted container "ovn-ipsec" out of: ovn-ipsec, ovn-keys (init)
+ rpm --dbpath=/usr/share/rpm -q libreswan
libreswan-4.9-4.el9_2.x86_64
+ counter=0
+ '[' -f /etc/cni/net.d/10-ovn-kubernetes.conf ']'
+ echo 'ovnkube-node has configured node.'
ovnkube-node has configured node.
+ ip x s flush
+ ip x p flush
+ ulimit -n 1024
+ /usr/libexec/ipsec/addconn --config /etc/ipsec.conf --checkconfig
+ /usr/libexec/ipsec/_stackmanager start
+ /usr/sbin/ipsec --checknss
+ /usr/libexec/ipsec/pluto --leak-detective --config /etc/ipsec.conf --logfile /var/log/openvswitch/libreswan.log
FATAL ERROR: /usr/libexec/ipsec/pluto: lock file "/run/pluto/pluto.pid" already exists
leak: string logger, item size: 48
leak: string logger prefix, item size: 27
leak detective found 2 leaks, total size 75

journalctl -u ipsec here: https://privatebin.corp.redhat.com/?216142833d016b3c#2Es8ACSyM3VWvwi85vTaYtSx8X3952ahxCvSHeY61UtT

https://github.com/openshift/cluster-network-operator/pull/1999

Bug OCPBUGS-22018: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-dns-operator/pull/387

Bug OCPBUGS-22560: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-disk-csi-driver-operator/pull/107

Bug OCPBUGS-24267: Cluster configuration fields are not visible

View the Description View the linked PRs

Cluster configuration page fields are not visible.

Screenshot : https://drive.google.com/file/d/17TrZNE2dY-AH-vUwcsjvC4E8wxiyPb9n/view?usp=drive_link

https://github.com/openshift/console/pull/13386

Bug OCPBUGS-19017: dnsmasq failing to start on bootstrap VM

View the Description View the linked PRs

dnsmasq isn't starting on okd-scos in the bootstrap VM

logs should it failing with "Operation not permitted"

https://github.com/openshift/installer/pull/7487

Bug OCPBUGS-21584: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes/pull/1757

Bug OCPBUGS-19205: Update 4.15 ose-cluster-cloud-controller-manager-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/278

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/278

Bug OCPBUGS-24082: Update 4.15 ose-cluster-dns-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-dns-operator/pull/396

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-dns-operator/pull/396

Bug OCPBUGS-24241: RHOCP installation on RHOSP fails with an error "Incompatible openstacksdk library found"

View the Description View the linked PRs

RHOCP installation on RHOSP fails with an error

~~~

$ ansible-playbook -i inventory.yaml security-groups.yaml
fatal: [localhost]: FAILED! => {"changed": false, "msg": "Incompatible openstacksdk library found: Version MUST be >=1.0 and <=None, but 0.36.5 is smaller than minimum version 1.0."}

~~~

Packages Installed :

ansible-2.9.27-1.el8ae.noarch Fri Oct 13 06:56:05 2023
python3-netaddr-0.7.19-8.el8.noarch Fri Oct 13 06:55:44 2023
python3-openstackclient-4.0.2-2.20230404115110.54bf2c0.el8ost.noarch Tue Nov 21 01:38:32 2023
python3-openstacksdk-0.36.5-2.20220111021051.feda828.el8ost.noarch Fri Oct 13 06:55:52 2023

Document followed :
https://docs.openshift.com/container-platform/4.13/installing/installing_openstack/installing-openstack-user.html#installation-osp-downloading-modules_installing-openstack-user

https://github.com/openshift/installer/pull/7821

Bug OCPBUGS-17218: GCP Shared VPC installation does not log when it cannot create firewall rules

View the Description View the linked PRs

Description of problem:

When installing OpenShift on GCP in a Shared VPC (formerly XPN) configuration, the service account used must have permissions to create firewall rules on the host project's network in order to proceed. If the account does not have permissions, the installation will fail but the explicit reason is not listed.

Version-Release number of selected component (if applicable):

4.14-ec.1

How reproducible:

100% of the time when the service account creating the cluster does not have Owner permissions or `compute.firewall.create` on the host project.

Steps to Reproduce:

1. Follow instructions at https://docs.openshift.com/container-platform/4.13/installing/installing_gcp/installing-gcp-shared-vpc.html
2. As part of the prerequisites, make a service account with the permissions listed at https://docs.openshift.com/container-platform/4.13/installing/installing_gcp/installing-gcp-account.html#minimum-required-permissions-ipi-gcp-xpn
3. Create a cluster using an install-config.yaml similar to the one attached

Actual results:

The cluster fails to bootstrap. The bootstrap node will be present, as will the masters, but components will not be able to reach the api-int load balancer.

Expected results:

The log files would include an error message regarding the missing permissions, and possibly abort the installation early.

Additional info:

https://docs.openshift.com/container-platform/4.13/installing/installing_gcp/installing-gcp-account.html#minimum-required-permissions-ipi-gcp-xpn does not list the `compute.firewalls.create` permission, which is included in the code at https://github.com/openshift/installer/blob/4f59664588c4472b7aba2838159651e729908dff/pkg/asset/cluster/tfvars.go#L79.
This is probably also a related docs improvement.

File attachment seems to have been disabled, so here is the text of the `install-config.yaml` that I was using:

additionalTrustBundlePolicy: Proxyonly
apiVersion: v1
baseDomain: installer.gcp.devcluster.openshift.com
compute:
- architecture: amd64
  hyperthreading: Enabled
  name: worker
  platform: {}
  replicas: 3
controlPlane:
  architecture: amd64
  hyperthreading: Enabled
  name: master
  platform: {}
  replicas: 3
credentialsMode: Passthrough
featureSet: TechPreviewNoUpgrade
metadata:
  creationTimestamp: null
  name: nrbxpn
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineNetwork:
  - cidr: 10.0.0.0/16
  networkType: OVNKubernetes
  serviceNetwork:
  - 172.30.0.0/16
platform:
  gcp:
    projectID: openshift-installer-shared-vpc
    region: us-central1
    network: installer-shared-vpc
    computeSubnet: installer-shared-vpc-subnet-1
    controlPlaneSubnet: installer-shared-vpc-subnet-2
    networkProjectID: openshift-dev-installer
publish: Internal
pullSecret: <omitted>
sshKey: <omitted>

https://github.com/openshift/installer/pull/7417

Bug OCPBUGS-21610: Monitoring-plugin can not start on IPv6 disabled cluster

View the Description View the linked PRs

Description of problem:

monitoring-plugin can not be started on IPv6 disabled cluster as the pod listen on [::]:9443. 

Monitoring-plugin should listen on [::]:9443 on IPv6 enabled cluster
Monitoring-plugin should listen on 0.0.0.0:9443 on IPv6 disabled cluster.


$oc logs monitoring-plugin-dc84478c-5rwmm2023/10/14 13:42:41 [emerg] 1#0: socket() [::]:9443 failed (97: Address family not supported by protocol)nginx: [emerg] socket() [::]:9443 failed (97: Address family not supported

Version-Release number of selected component (if applicable):

4.14.0-rc.5

How reproducible:

Always

Steps to Reproduce:

1) disable ipv6 following https://access.redhat.com/solutions/5513111

cat <<EOF |oc create -f -
apiVersion: machineconfiguration.openshift.io
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-openshift-machineconfig-master-kargs
spec:
  kernelArguments:
  - ipv6.disable=1
EOF
 
cat <<EOF |oc create -f -
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
machineconfiguration.openshift.io/role: worker
  name: 99-openshift-machineconfig-worker-kargs
spec:
  kernelArguments:
   -  ipv6.disable=1
EOF

2) Check the mcp status

3) Check the monitoring plugin pod status

Actual results:
1) mcp is pending as monitor-plugin pod can not be schedule

$ oc get mcp |grep worker.
worker   rendered-worker-ba1d1b8306f65bc5ff53b0c05a54143f   False     True       False      5              3                   3                     0                      3h59m

$oc logs machine-config-controller-5b96788c69-j9d7k
I1014 13:05:57.767217       1 drain_controller.go:350] Previous node drain found. Drain has been going on for 0.025260005567777778 hours
I1014 13:05:57.767228       1 drain_controller.go:173] node anlim14-c6jbb-worker-b-rgqq5.c.openshift-qe.internal: initiating drain
E1014 13:05:58.411241       1 drain_controller.go:144] WARNING: ignoring DaemonSet-managed ……
I1014 13:05:58.413116       1 drain_controller.go:144] evicting pod openshift-monitoring/monitoring-plugin-dc84478c-92xr4
E1014 13:05:58.422164       1 drain_controller.go:144] error when evicting pods/"monitoring-plugin-dc84478c-92xr4" -n "openshift-monitoring" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
I1014 13:06:03.422338       1 drain_controller.go:144] evicting pod openshift-monitoring/monitoring-plugin-dc84478c-92xr4
E1014 13:06:03.433295       1 drain_controller.go:144] error when evicting pods/"monitoring-plugin-dc84478c-92xr4" -n "openshift-monitoring" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.

2) monitoring-plugin pod listen on [::] which is an invalid address on IPv6 disabled cluster.

$oc extract cm/monitoring-plugin
$cat nginx.conf 
error_log /dev/stdout info;
events {}
http {
  include            /etc/nginx/mime.types;
  default_type       application/octet-stream;
  keepalive_timeout  65;
  server {
    listen              9443 ssl;
    listen              [::]:9443 ssl;
    ssl_certificate     /var/cert/tls.crt;
    ssl_certificate_key /var/cert/tls.key;
    root                /usr/share/nginx/html;
  }
}

Expected results:

Monitoring-plugin listens on [::]:9443 on IPv6 enabled cluster

Monitoring-plugin listens on 0.0.0.0:9443 on IPv6 disabled cluster.

Additional info:

The PR about how logging fix this issue. https://github.com/openshift/cluster-logging-operator/pull/2207/files#diff-dc6205a02c6c783e022ae0d4c726327bee4ef34cd1361541d1e3165ee7056b38R43

Bug OCPBUGS-22946: verbose prometheus-operator-admission-webhook logs

View the Description View the linked PRs

Description of problem:

issue is found when verify ~~OCPBUGS-21637~~, so verbose prometheus-operator-admission-webhook logs

$ oc -n openshift-monitoring get pod -l app.kubernetes.io/name=prometheus-operator-admission-webhook
NAME                                                     READY   STATUS    RESTARTS   AGE
prometheus-operator-admission-webhook-5d96cbcbfc-6lx4m   1/1     Running   0          56m
prometheus-operator-admission-webhook-5d96cbcbfc-jj66x   1/1     Running   0          53m

$ oc -n openshift-monitoring logs prometheus-operator-admission-webhook-5d96cbcbfc-6lx4m
level=info ts=2023-11-06T01:50:33.617049649Z caller=main.go:140 address=[::]:8443 msg="Starting TLS enabled server" http2=false
ts=2023-11-06T01:50:34.601774794Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:50:40.439015896Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:50:40.43925044Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:50:50.437745065Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:50:50.448362455Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:51:00.428162615Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:51:00.428571968Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:51:10.426317894Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:51:10.426769416Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:51:20.426701853Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:51:20.427289877Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:51:30.429156675Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:51:30.429229042Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:51:40.426522527Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:51:40.427038656Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:51:50.428974832Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:51:50.429036156Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:52:00.428747039Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:52:00.42880275Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:52:10.426871896Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:52:10.428574666Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:52:20.428211529Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:52:20.428638108Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:52:30.427148775Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:52:30.427631515Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:52:40.427167231Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:52:40.427658789Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:52:50.427851476Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:52:50.428319729Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:53:00.428583783Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:53:00.429083642Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:53:10.426258718Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:53:10.426788637Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:53:20.430876533Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:53:20.431510269Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:53:30.427527316Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:53:30.428046481Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:53:40.428449342Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:53:40.428886681Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:53:50.426513473Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:53:50.427038956Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:54:00.426639171Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:54:00.427164997Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:54:10.426804033Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:54:10.427276217Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:54:20.427705297Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:54:20.428214309Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:54:30.428041006Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:54:30.428525809Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:54:40.426257489Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:54:40.42674803Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:54:50.42708913Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:54:50.427155482Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:55:00.428431788Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:55:00.428881681Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:55:10.429549989Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:55:10.429618004Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:55:20.427741192Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:55:20.428196221Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:55:30.4269946Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:55:30.427451901Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:55:40.426994787Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:55:40.427502475Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:55:50.426456346Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:55:50.426610051Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:56:00.426520596Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:56:00.426676076Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:56:10.435077603Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:56:10.435135319Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:56:20.427693249Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:56:20.428171589Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:56:30.428760772Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:56:30.428828762Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:56:40.428545666Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:56:40.429005303Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:56:50.426103842Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:56:50.426578009Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:57:00.427041793Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:57:00.427482797Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:57:10.427963834Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:57:10.428440451Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:57:20.428877932Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:57:20.428945521Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:57:30.426157935Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:57:30.426639545Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:57:40.42875961Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:57:40.42884264Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:57:50.426450177Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:57:50.426939532Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:58:00.428456873Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:58:00.428904131Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:58:10.428931448Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:58:10.428987646Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:58:20.429377819Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:58:20.4294396Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:58:30.428108184Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:58:30.428580595Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:58:40.426962512Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:58:40.427429076Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:58:50.429177401Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:58:50.429637834Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:59:00.428197981Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:59:00.428655487Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:59:10.426418388Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:59:10.426908577Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:59:20.426705875Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:59:20.427197531Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:59:30.427909675Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:59:30.428395421Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:59:40.429100447Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:59:40.429871853Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:59:50.4268663Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:59:50.427329161Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:00:00.429149297Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:00:00.429205811Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:00:10.426857098Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:00:10.427290243Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:00:20.42638474Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:00:20.426901703Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:00:30.428885162Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:00:30.429373666Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:00:40.427093878Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:00:40.427622056Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:00:50.428691098Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:00:50.428743261Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:01:00.426355861Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:01:00.42685464Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:01:10.426208743Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:01:10.426710363Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:01:20.426872491Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:01:20.42731801Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:01:30.426612427Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:01:30.427084214Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:01:40.428796629Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:01:40.429400491Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:01:50.427001992Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:01:50.42827597Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:02:00.428013056Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:02:00.428469744Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:02:10.426711057Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:02:10.427247058Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:02:20.429136255Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:02:20.429208369Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:02:30.427158806Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:02:30.427593326Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:02:40.426389918Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:02:40.426875768Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:02:50.429551365Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:02:50.429619241Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:03:00.426621326Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:03:00.427126079Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:03:10.426301507Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:03:10.426803336Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:03:13.952615577Z caller=stdlib.go:105 caller=server.go:3215 msg="http: TLS handshake error from 10.130.0.1:52552: EOF"
ts=2023-11-06T02:03:20.426371089Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:03:20.426852234Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:03:30.428789504Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:03:30.428874536Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:03:40.427028458Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:03:40.427463333Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:03:50.429615112Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:03:50.429679407Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:04:00.4285878Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:04:00.429074488Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:04:10.4279579Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:04:10.428403727Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:04:20.426433063Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:04:20.426940057Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:04:30.428317498Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:04:30.428730147Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:04:40.42911069Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:04:40.429194383Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:04:50.42820753Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:04:50.428643464Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:05:00.427890872Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:05:00.428356508Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from 
...

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-11-04-120954

How reproducible:

always

Steps to Reproduce:

1. check prometheus-operator-admission-webhook logs

Actual results:

verbose prometheus-operator-admission-webhook logs

https://github.com/openshift/prometheus-operator/pull/254

Bug OCPBUGS-26607: CVO does not reconcile metadata on ClusterOperators

View the Description View the linked PRs

This is a clone of issue OCPBUGS-26014. The following is the description of the original issue:
—

Description of problem:

While testing oc adm upgrade status against b02, I noticed some COs do not have any annotations, while I expected them to have the include/exclude.release.openshift.io/* ones (to recognize COs that come from the payload).

$ b02 get clusteroperator etcd -o jsonpath={.metadata.annotations}
$ ota-stage get clusteroperator etcd -o jsonpath={.metadata.annotations}
{"exclude.release.openshift.io/internal-openshift-hosted":"true","include.release.openshift.io/self-managed-high-availability":"true","include.release.openshift.io/single-node-developer":"true"}

CVO does not reconcile CO resources once they exist, only precreates them but does not touch them once they exist. Build02 does not have CO with reconciled metadata because it was born as 4.2 which (AFAIK) is before OCP started to use the exclude/include annotations.

Version-Release number of selected component (if applicable):

4.16 (development branch)

How reproducible:

deterministic

Steps to Reproduce:

1. delete an annotation on a ClusterOperator resource

Actual results:

The annotation wont be recreated

Expected results:

The annotation should be recreated

https://github.com/openshift/cluster-version-operator/pull/1017

Bug OCPBUGS-19255: Update 4.15 ose-vsphere-cloud-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-vsphere/pull/48

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-vsphere/pull/48

Bug OCPBUGS-25233: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/alibaba-disk-csi-driver-operator/pull/77

Bug OCPBUGS-19240: Update 4.15 ose-cluster-capi-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-capi-operator/pull/129

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-capi-operator/pull/129

Bug OCPBUGS-23161: cluster-network-operator does not emit logs from logr

View the Description View the linked PRs

See log:

[controller-runtime] log.SetLogger(...) was never called, logs will not be displayed:
goroutine 422 [running]:
runtime/debug.Stack()
runtime/debug/stack.go:24 +0x65
sigs.k8s.io/controller-runtime/pkg/log.eventuallyFulfillRoot()
sigs.k8s.io/controller-runtime@v0.15.0/pkg/log/log.go:59 +0xbd
sigs.k8s.io/controller-runtime/pkg/log.(*delegatingLogSink).WithName(0xc000521140, {0x2d0b2ef, 0x14})
sigs.k8s.io/controller-runtime@v0.15.0/pkg/log/deleg.go:147 +0x4c
github.com/go-logr/logr.Logger.WithName({

Unknown macro: {0x31b3e78, 0xc000521140}

, 0x0}, {0x2d0b2ef?, 0x40?})
github.com/go-logr/logr@v1.2.4/logr.go:336 +0x46
sigs.k8s.io/controller-runtime/pkg/client.newClient(0xc000471440, {0x0, 0x0,

Unknown macro: {0x31b5c00, 0xc000eb3100}
, 0x0, {0x0, 0x0}, 0x0})
sigs.k8s.io/controller-runtime@v0.15.0/pkg/client/client.go:115 +0xb4
sigs.k8s.io/controller-runtime/pkg/client.New(0x319b2b0?, {0x0, 0x0,

, 0x0, {0x0, 0x0}, 0x0})
sigs.k8s.io/controller-runtime@v0.15.0/pkg/client/client.go:101 +0x85
github.com/openshift/cluster-network-operator/pkg/client.NewClusterClient(0xc000471440, 0xc000499b00)
github.com/openshift/cluster-network-operator/pkg/client/client.go:188 +0x2b0
github.com/openshift/cluster-network-operator/pkg/client.NewClient(0x0?, 0x0?, {0x2cecdf7, 0x7}, 0x0?)
github.com/openshift/cluster-network-operator/pkg/client/client.go:100 +0xa5
github.com/openshift/cluster-network-operator/pkg/operator.RunOperator({0x31ace70, 0xc0009a0b90}, 0xc000318a40, {0x2cecdf7, 0x7}, 0x0?)
github.com/openshift/cluster-network-operator/pkg/operator/operator.go:46 +0xbd
main.newNetworkOperatorCommand.func2({0x31ace70?, 0xc0009a0b90?}, 0x31acee0?)
github.com/openshift/cluster-network-operator/cmd/cluster-network-operator/main.go:49 +0x3b
github.com/openshift/library-go/pkg/controller/controllercmd.ControllerBuilder.getOnStartedLeadingFunc.func1.1()
github.com/openshift/library-go@v0.0.0-20230503144409-4cb26a344c37/pkg/controller/controllercmd/builder.go:351 +0x74
created by github.com/openshift/library-go/pkg/controller/controllercmd.ControllerBuilder.getOnStartedLeadingFunc.func1
github.com/openshift/library-go@v0.0.0-20230503144409-4cb26a344c37/pkg/controller/controllercmd/builder.go:349 +0x10a

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-hypershift-release-4.15-periodics-e2e-aws-ovn-conformance/1722551726378061824/artifacts/e2e-aws-ovn-conformance/dump/artifacts/namespaces/clusters-dfc32df54edf6e3b2a2e/core/pods/logs/cluster-network-operator-d79876885-l5h6b-cluster-network-operator.log

I think we should want logs

https://github.com/openshift/cluster-network-operator/pull/2129

Bug OCPBUGS-23458: OCP 4.14 Installation fails in environments where S3 versioning is enforced

View the Description View the linked PRs

Description of problem:

OCP 4.14 installation fails in AWS environments where S3 versioning is enforced. OCP 4.13 installs successfully in the same environment.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Use any native AWS ways to enforce Versioning on S3. AWS Config is easiest. This will enable versioning on S3 buckets after creation.  
2. Install OCP 4.13 on AWS just using the defaults. It will succeed.
3. Install OCP 4.14 on AWS just using the defaults. It will fail.

Actual results:

OCP 4.14 installation fails fatally.

Expected results:

OCP 4.14 installation succeeds just like OCP 4.13 installation. 
OR - if defaults are changed, provided documentation.

Additional info:

1. Related 4.14 feature : https://docs.openshift.com/container-platform/4.14/release_notes/ocp-4-14-release-notes.html#ocp-4-14-aws-s3-deletion - provides the ability to skip deletion of S3 buckets altogether. 
2. Attached OCP logs.
3. Strategic enterprise customers of managed services use data governance policies that enforce versioning, bucket policy etc that are blocked from installing

https://github.com/openshift/installer/pull/7791

Bug OCPBUGS-24582: Functions list page always show create project page

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13419

Bug OCPBUGS-23398: HyperShift AWS KMS Backup key ARN incorrect

View the Description View the linked PRs

Description of problem:

AWS KMS on HyperShift makes use of two UNIX sockets via which the KMS plugins are run. Each unix socket should run connect to independent KMS instances i.e. with their own AWS ARNs. However, as of today both the active KMS socket as well as the backup KMS socket seem to be using the same ARN which essentially translates that the backup KMS instance never gets used.

Version-Release number of selected component (if applicable):

HyperShift - main branch (PR #423)
GitHub indicates all the following hypershift versions would be affected.
v0.1.15, v0.1.14, v0.1.13,  v0.1.12, v0.1.11, v0.1.10, v0.1.9, v0.1.8, v0.1.7, v0.1.6, v0.1.5, v0.1.4, v0.1.3, v0.1.2, v0.1.1, v0.1.0, 2.0.0-20220406093220, 2.0.0-20220323110745, 2.0.0-20220319120001, 2.0.0-20220317155435

How reproducible:

Always

Steps to Reproduce:

1. By creating a HyperShift cluster
2. Checking if backup KMS instance was ever used

Actual results:

Active KMS instance's ARN is used even by the backup KMS socket

Expected results:

Backup KMS socket should use it's own backupKey.ARN

Additional info:

https://github.com/openshift/hypershift/blob/main/control-plane-operator/controllers/hostedcontrolplane/kas/aws_kms.go#L119

should use backupKey.ARN instead of activeKey.ARN in the func call

https://github.com/openshift/hypershift/pull/3216

Bug OCPBUGS-19208: Update 4.15 ose-nutanix-machine-controllers image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-nutanix/pull/51

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-nutanix/pull/51

Bug OCPBUGS-20266: [AWS] Unit tests have deadlock condition in termination handler

View the Description View the linked PRs

Description of problem:

Due to the way that the termination handlers unit tests are configured, it is possible in some cases for the counter of http requests to the mock handler can cause the test to deadlock and time out. This happens randomly as the ordering of the tests has an effect on when the bug occurs.

Version-Release number of selected component (if applicable):

4.13+

How reproducible:

It happens randomly when run in CI, or when the full suite is run. But if the tests are focused it will happen every time.
Focusing on "poll URL cannot be reached" will exploit the unit test.

Steps to Reproduce:

1. add `-focus "poll URL cannot be reached"` to unit test ginkgo arguments
2. run `make unit`

Actual results:

test suite hangs after this output:
"Handler Suite when running the handler when polling the termination endpoint and the poll URL cannot be reached should return an error /home/mike/dev/machine-api-provider-aws/pkg/termination/handler_test.go:197"

Expected results:

Tests pass

Additional info:

to fix this we need to isolate the test in its own context block, this patch should do the trick:

diff --git a/pkg/termination/handler_test.go b/pkg/termination/handler_test.go
index 2b98b08b..0f85feae 100644
--- a/pkg/termination/handler_test.go
+++ b/pkg/termination/handler_test.go
@@ -187,7 +187,9 @@ var _ = Describe("Handler Suite", func() {
                                        Consistently(nodeMarkedForDeletion(testNode.Name)).Should(BeFalse())
                                })
                        })
+               })
 
+               Context("when the termination endpoint is not valid", func() {
                        Context("and the poll URL cannot be reached", func() {
                                BeforeEach(func() {
                                        nonReachable := "abc#1://localhost"

https://github.com/openshift/machine-api-provider-aws/pull/84

Bug OCPBUGS-25132: dual-stack UPI: IPv6 security group rules created for single-stack cluster

View the Description View the linked PRs

Description of problem:

security-groups.yaml playbook runs the IPv6 security group rules creation tasks regardless of the os_subnet6 value.
The when clause is not considering the os_subnet6 [1] value and is always executed.

It works with:

  - name: 'Create security groups for IPv6'
    block:
    - name: 'Create master-sg IPv6 rule "OpenShift API"'
    [...]
    when: os_subnet6 is defined

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-12-11-033133

How reproducible:

Always

Steps to Reproduce:

1. Don't set the os_subnet6 in the inventory file [2] (so it's not dual-stack)
2. Deploy 4.15 UPI by running the UPI playbooks

Actual results:

IPv6 security group rules are created

Expected results:

IPv6 security group rules shouldn't be created

Additional info:
[1] https://github.com/openshift/installer/blob/46fd66272538c350327880e1ed261b70401b406e/upi/openstack/security-groups.yaml#L375
[2] https://github.com/openshift/installer/blob/46fd66272538c350327880e1ed261b70401b406e/upi/openstack/inventory.yaml#L77

https://github.com/openshift/installer/pull/7833

Bug OCPBUGS-16762: HAProxy Log Length will take only destination type when both syslog and container is configured on default ingress controller

View the Description View the linked PRs

Description of problem:

During the testing of NE1264 epic, i configured both syslog and container destination type of logging on the same default ingress controller. In the ingress controller spec we can see, it is taking both the destination type, but it is not reflect in ROUTER_LOG_MAX_LENGTH env  or the haproxy.config file

melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator get ingresscontroller/default -oyaml
apiVersion: operator.openshift.io/v1
kind: IngressController
<-----snip--->
spec:
  clientTLS:
    clientCA:
      name: ""
    clientCertificatePolicy: ""
  httpCompression: {}
  httpEmptyRequestsPolicy: Respond
  httpErrorCodePages:
    name: ""
  logging:
    access:
      destination:
        container:
          maxLength: 1024
        syslog:
          address: 1.2.3.4
          maxLength: 1024
          port: 514
        type: Container
      logEmptyRequests: Log
  replicas: 2
  tuningOptions:
    reloadInterval: 0s
  unsupportedConfigOverrides: null



melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress exec router-default-6c86ff75d9-g24q5    -- env | grep ROUTER_LOG_MAX_LENGTH
Defaulted container "router" out of: router, logs
ROUTER_LOG_MAX_LENGTH=1024
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress exec router-default-6c86ff75d9-l9rjv -- cat haproxy.config | grep 1024  
Defaulted container "router" out of: router, logs
  log /var/lib/rsyslog/rsyslog.sock len 1024 local1 info


when we patch changes to log length, it is not reflect as expected for one destination.

melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator patch ingresscontroller/default -p '{"spec":{"logging":{"access":{"destination":{"container":{"maxLength":480}}}}}}' --type=merge
ingresscontroller.operator.openshift.io/default patched
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress exec router-default-6476d6c69d-tlhqd -- env | grep ROUTER_LOG_MAX_LENGTH    
Defaulted container "router" out of: router, logs
ROUTER_LOG_MAX_LENGTH=480


melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator patch ingresscontroller/default -p '{"spec":{"logging":{"access":{"destination":{"syslog":{"maxLength":4096}}}}}}' --type=merge
ingresscontroller.operator.openshift.io/default patched

melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator get ingresscontroller/default -oyaml                                                                                    
apiVersion: operator.openshift.io/v1
kind: IngressController
<----snip---->
spec:
  clientTLS:
    clientCA:
      name: ""
    clientCertificatePolicy: ""
  httpCompression: {}
  httpEmptyRequestsPolicy: Respond
  httpErrorCodePages:
    name: ""
  logging:
    access:
      destination:
        container:
          maxLength: 480
        syslog:
          address: 1.2.3.4
          maxLength: 4096
          port: 514
        type: Container
      logEmptyRequests: Log
  replicas: 2
  tuningOptions:
    reloadInterval: 0s
  unsupportedConfigOverrides: null


melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress exec router-default-59cf55666d-shq98 -- env | grep ROUTER_LOG_MAX_LENGTH 
Defaulted container "router" out of: router, logs
ROUTER_LOG_MAX_LENGTH=480


In another round of testing i can see only the syslog destination type is reflecting on env and not the container destination type.

I am also not sure whether it is a valid situation where we can use both type of destination  type on default ingress controller.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Steps to Reproduce:

1. Edit the default ingress controller and add both destination type configs
2.
3.

Actual results:

Either one type value is only reflecting in the haproxy.config file

Expected results:

Both type should we reflected

Additional info:

Bug OCPBUGS-19261: Update 4.15 openshift-enterprise-egress-dns-proxy image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/images/pull/153

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/images/pull/153

Bug OCPBUGS-19761: On an SNO with Telco DU profile must-gather perf-node-gather-daemonset fails: Error creating: pods "perf-node-gather-daemonset-" is forbidden: autoscaling.openshift.io/ManagementCPUsOverride the pod namespace does not allow the workload type management

View the Description View the linked PRs

Description of problem:

When running must-gather against an SNO with Telco DU profile the perf-node-gather-daemonset seems to not be able to start with the error below:

 Warning  FailedCreate  2m37s (x16 over 5m21s)  daemonset-controller  Error creating: pods "perf-node-gather-daemonset-" is forbidden: autoscaling.openshift.io/ManagementCPUsOverride the pod namespace "openshift-must-gather-sbhml" does not allow the workload type management

must-gather shows it's retrying for 300s and reports that performance data collection was complete even though the daemonset pod didn't come up.

[must-gather-nhbgr] POD 2023-09-26T10:15:39.591582116Z Waiting for performance profile collector pods to become ready: 1
[..]
[must-gather-nhbgr] POD 2023-09-26T10:21:07.108893075Z Waiting for performance profile collector pods to become ready: 300
[must-gather-nhbgr] POD 2023-09-26T10:21:08.473217146Z daemonset.apps "perf-node-gather-daemonset" deleted
[must-gather-nhbgr] POD 2023-09-26T10:21:08.480906220Z INFO: Node performance data collection complete.

Version-Release number of selected component (if applicable):

4.14.0-rc.2

How reproducible:

100%

Steps to Reproduce:

1. Deploy SNO with Telco DU profile
2. Run oc adm must-gather

Actual results:

performance data collection doesn't run because daemonset cannot be scheduled.

Expected results:

performance data collection runs.

Additional info:

DaemonSet describe:

oc -n openshift-must-gather-sbhml describe ds
Name:           perf-node-gather-daemonset
Selector:       name=perf-node-gather-daemonset
Node-Selector:  <none>
Labels:         <none>
Annotations:    deprecated.daemonset.template.generation: 1
Desired Number of Nodes Scheduled: 1
Current Number of Nodes Scheduled: 0
Number of Nodes Scheduled with Up-to-date Pods: 0
Number of Nodes Scheduled with Available Pods: 0
Number of Nodes Misscheduled: 0
Pods Status:  0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:       name=perf-node-gather-daemonset
  Annotations:  target.workload.openshift.io/management: {"effect": "PreferredDuringScheduling"}
  Containers:
   node-probe:
    Image:      registry.kni-qe-0.lab.eng.rdu2.redhat.com:5000/openshift-release-dev@sha256:2af2c135f69f162ed8e0cede609ddbd207d71a3c7bd49e9af3fcbb16737aa25a
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/bash
      -c
      echo ok > /tmp/healthy && sleep INF
    Limits:
      cpu:     100m
      memory:  256Mi
    Requests:
      cpu:        100m
      memory:     256Mi
    Readiness:    exec [cat /tmp/healthy] delay=5s timeout=1s period=5s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /host/podresources from podres (rw)
      /host/proc from proc (ro)
      /host/sys from sys (ro)
      /lib/modules from lib-modules (ro)
  Volumes:
   sys:
    Type:          HostPath (bare host directory volume)
    Path:          /sys
    HostPathType:  Directory
   proc:
    Type:          HostPath (bare host directory volume)
    Path:          /proc
    HostPathType:  Directory
   lib-modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:  Directory
   podres:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/pod-resources
    HostPathType:  Directory
Events:
  Type     Reason        Age                     From                  Message
  ----     ------        ----                    ----                  -------
  Warning  FailedCreate  2m37s (x16 over 5m21s)  daemonset-controller  Error creating: pods "perf-node-gather-daemonset-" is forbidden: autoscaling.openshift.io/ManagementCPUsOverride the pod namespace "openshift-must-gather-sbhml" does not allow the workload type management

https://github.com/openshift/must-gather/pull/385

Bug OCPBUGS-22830: Specify google cloud CLI to version 447.0.0

View the Description View the linked PRs

Description of problem:

google CLI deprecated Python 3.5-3.7 from 448.0.0 causing release ci jobs failed with ERROR: gcloud failed to load. You are running gcloud with Python 3.6, which is no longer supported by gcloud. . specified version to 447.0.0
job link: https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-o[…]cp-upi-f28-destructive/1719562110486188032

https://github.com/openshift/installer/pull/7663

Bug OCPBUGS-26600: HCP fails to deploy with TechPreviewNoUpgrade featue set

View the Description View the linked PRs

spec:
  configuration:
    featureGate:
      featureSet: TechPreviewNoUpgrade

$ oc get pod
NAME                                      READY   STATUS             RESTARTS      AGE
capi-provider-bd4858c47-sf5d5             0/2     Init:0/1           0             9m33s
cluster-api-85f69c8484-5n9ql              1/1     Running            0             9m33s
control-plane-operator-78c9478584-xnjmd   2/2     Running            0             9m33s
etcd-0                                    3/3     Running            0             9m10s
kube-apiserver-55bb575754-g4694           4/5     CrashLoopBackOff   6 (81s ago)   8m30s

$ oc logs kube-apiserver-55bb575754-g4694 -c kube-apiserver --tail=5
E0105 16:49:54.411837       1 controller.go:145] while syncing ConfigMap "kube-system/kube-apiserver-legacy-service-account-token-tracking", err: namespaces "kube-system" not found
I0105 16:49:54.415074       1 trace.go:236] Trace[236726897]: "Create" accept:application/vnd.kubernetes.protobuf, */*,audit-id:71496035-d1fe-4ee1-bc12-3b24022ea39c,client:::1,api-group:scheduling.k8s.io,api-version:v1,name:,subresource:,namespace:,protocol:HTTP/2.0,resource:priorityclasses,scope:resource,url:/apis/scheduling.k8s.io/v1/priorityclasses,user-agent:kube-apiserver/v1.29.0 (linux/amd64) kubernetes/9368fcd,verb:POST (05-Jan-2024 16:49:44.413) (total time: 10001ms):
Trace[236726897]: ---"Write to database call failed" len:174,err:priorityclasses.scheduling.k8s.io "system-node-critical" is forbidden: not yet ready to handle request 10001ms (16:49:54.415)
Trace[236726897]: [10.001615835s] [10.001615835s] END
F0105 16:49:54.415382       1 hooks.go:203] PostStartHook "scheduling/bootstrap-system-priority-classes" failed: unable to add default system priority classes: priorityclasses.scheduling.k8s.io "system-node-critical" is forbidden: not yet ready to handle request

https://github.com/openshift/hypershift/pull/3399

Task OSASINFRA-3295: Openshift create install-config command broken due to wrong client used to list flavors

View the linked PRs

https://github.com/openshift/installer/pull/7723

Bug OCPBUGS-20478: The secret/vmware-vsphere-cloud-credentials in ns/openshift-cluster-csi-drivers is not synced correctly when updating secret/vsphere-creds in ns/kube-system

View the Description View the linked PRs

Description of problem:

The secret/vmware-vsphere-cloud-credentials in ns/openshift-cluster-csi-drivers is not synced correctly when updating secret/vsphere-creds in ns/kube-system

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-10-084534

How reproducible:

Always

Steps to Reproduce:

Before updating the secret

$ oc -n kube-system get secret vsphere-creds -o yaml
apiVersion: v1
data:
  vcenter.devqe.ibmc.devcluster.openshift.com.password: xxx
  vcenter.devqe.ibmc.devcluster.openshift.com.username: xxx
kind: Secret
metadata:
  annotations:
    cloudcredential.openshift.io/mode: passthrough
...

Same for the secret/vmware-vsphere-cloud-credentials in ns/openshift-cluster-csi-drivers

$ oc -n openshift-cluster-csi-drivers get secret vmware-vsphere-cloud-credentials -o yaml
apiVersion: v1
data:
  vcenter.devqe.ibmc.devcluster.openshift.com.password: xxx
  vcenter.devqe.ibmc.devcluster.openshift.com.username: xxx
kind: Secret
metadata:
  annotations:
    cloudcredential.openshift.io/credentials-request: openshift-cloud-credential-operator/openshift-vmware-vsphere-csi-driver-operator
…

replace secret/vsphere-creds to use new vcenter (just for test)

$ oc -n kube-system get secret vsphere-creds -o yaml 
apiVersion: v1
data:
  vcsa2-qe.vmware.devcluster.openshift.com.password: xxx
  vcsa2-qe.vmware.devcluster.openshift.com.username: xxx
(Updated to vcsa2-qe)

There are two vcenter info in vmware-vsphere-cloud-credentials:

$ oc -n openshift-cluster-csi-drivers get secret vmware-vsphere-cloud-credentials -o yaml
apiVersion: v1
data:
  vcenter.devqe.ibmc.devcluster.openshift.com.password: xxx
  vcenter.devqe.ibmc.devcluster.openshift.com.username: xxx
  vcsa2-qe.vmware.devcluster.openshift.com.password: xxx
  vcsa2-qe.vmware.devcluster.openshift.com.username: xxx
(devqe and vcsa2-qe)

restore secret/vsphere-creds

$ oc -n kube-system get secret vsphere-creds -o yaml
apiVersion: v1
data:
  vcenter.devqe.ibmc.devcluster.openshift.com.password: xxx
  vcenter.devqe.ibmc.devcluster.openshift.com.username: xxx
(Updated to devqe)

Still two vcenter info in vmware-vsphere-cloud-credentials:

$ oc -n openshift-cluster-csi-drivers get secret vmware-vsphere-cloud-credentials -o yaml
apiVersion: v1
data:
  vcenter.devqe.ibmc.devcluster.openshift.com.password: xxx
  vcenter.devqe.ibmc.devcluster.openshift.com.username: xxx
  vcsa2-qe.vmware.devcluster.openshift.com.password: xxx
  vcsa2-qe.vmware.devcluster.openshift.com.username: xxx
(devqe and vcsa2-qe)

Actual results:

The secret/vmware-vsphere-cloud-credentials is not synced well

Expected results:

The secret/vmware-vsphere-cloud-credentials should be synced well

Additional info:

Storage vSphere csi driver controller pods are crash looping.

https://github.com/openshift/cloud-credential-operator/pull/628

Bug OCPBUGS-23761: After PatternFly5 update: Quick search input field is broken

View the Description View the linked PRs

Issue 22 from https://docs.google.com/spreadsheets/d/1TR3ENY-GE_LQL9F-xH6NahtRHdu6IrYGhLqDDA8y0EI/edit#gid=1035185624

Screenshot: https://drive.google.com/file/d/1JNaMRpOGEcGoyPg7xuoy5hHzyI1jCE3s/view?usp=sharing

https://github.com/openshift/console/pull/13398

Bug OCPBUGS-24148: Update 4.15 cluster-network-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-network-operator/pull/2133

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-network-operator/pull/2133

Bug OCPBUGS-23309: oc-mirror should failed but not panic when falied to band port

View the Description View the linked PRs

Description of problem:

When use oc-mirror try to band port failed will panic

Version-Release number of selected component (if applicable):

./oc-mirror version 
Logging to .oc-mirror.log
WARNING: This version information is deprecated and will be replaced with the output from --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"", Minor:"", GitVersion:"4.15.0-202311101707.p0.g1c8f538.assembly.stream-1c8f538", GitCommit:"1c8f538897c88011c51ab53ea5073547521f0676", GitTreeState:"clean", BuildDate:"2023-11-10T18:49:00Z", GoVersion:"go1.20.10 X:strictfipsruntime", Compiler:"gc", Platform:"linux/amd64"}

How reproducible:

always

Steps to Reproduce:

run command : oc-mirror --from file://out docker://localhost:5000/ocptest --v2 --config config.yaml --dest-tls-verify=false

Actual results:
oc-mirror --from file://out docker://localhost:5000/ocptest --v2 --config config.yaml --dest-tls-verify=false
--v2 flag identified, flow redirected to the oc-mirror v2 version. PLEASE DO NOT USE that. V2 is still under development and it is not ready to be used.
2023/11/15 13:04:47 [INFO] : mode diskToMirror
2023/11/15 13:04:47 [INFO] : local storage registry will log to /app1/1106/logs/registry.log
2023/11/15 13:04:47 [INFO] : starting local storage on :5000
panic: listen tcp :5000: bind: address already in use

goroutine 67 [running]:
github.com/openshift/oc-mirror/v2/pkg/cli.panicOnRegistryError(0x0?)
/go/src/github.com/openshift/oc-mirror/vendor/github.com/openshift/oc-mirror/v2/pkg/cli/executor.go:298 +0x4e
created by github.com/openshift/oc-mirror/v2/pkg/cli.(*ExecutorSchema).PrepareStorageAndLogs
/go/src/github.com/openshift/oc-mirror/vendor/github.com/openshift/oc-mirror/v2/pkg/cli/executor.go:286 +0x945

Expected results:

Should exit with error but not panic

https://github.com/openshift/oc-mirror/pull/744

Bug OCPBUGS-20179: Nodepool metric does not correctly reflect nodepool state

View the Description View the linked PRs

Description of problem:

hypershift_nodepools_available_replicas does not properly reflect the nodepool.

$ oc get nodepools -n ocm-production-12345678
NAME              CLUSTER   DESIRED NODES   CURRENT NODES   AUTOSCALING   AUTOREPAIR   VERSION   UPDATINGVERSION   UPDATINGCONFIG   MESSAGE
re-test-workers   re-test   2               0               False         True         4.12.35                                      Minimum availability requires 2 replicas, current 0 available

Meanwhile, there are 3 hypershift_nodepools_available_replicas time series for the nodepools:
- re-test-worker2 reporting 1
- re-test-worker3 reporting 1
- re-test-workers reporting 0 (accurate)

The issue here is the two extra time series, which should not exist if the nodepool doesn't exist.

Version-Release number of selected component (if applicable):

4.12.35

How reproducible:

This particular cluster had its OIDC configuration along with other customer AWS account resources deleted, which might be connected to the misbehaviour of the metric.

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Adding must-gather and metric time series in the ticket

https://github.com/openshift/hypershift/pull/2671

Bug OCPBUGS-23956: After PatternFly5 update: Task node has text decoration on hover

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13371

Story TRT-1376: 4.15 CI Payloads Failing on GCP with Credentials Operator problems

View the Description View the linked PRs

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/aggregated-gcp-ovn-upgrade-4.15-micro-release-openshift-release-analysis-aggregator/1730445943465054208

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/aggregated-gcp-ovn-upgrade-4.15-micro-release-openshift-release-analysis-aggregator/1730385372728266752

All jobs that run seem to hit the same quota problem we saw recently:

failed to grant creds: error syncing creds in mint-mode: error creating custom role: rpc error: code = ResourceExhausted desc = Maximum number of roles reached. Maximum is: 300\nerror details: retry in 24h0m1s

This time it seems to be surfacing on a new credentials request from storage: openshift-gcp-pd-csi-driver-operator which was just moved from predefined roles to fine grained permissions in https://github.com/openshift/cluster-storage-operator/pull/410, likely why we're now tripping over this limit.

We're going to revert and buy time for CCO team to investigate.

https://github.com/openshift/cluster-storage-operator/pull/426

Bug OCPBUGS-18552: Ensure vlan interface names will be <= 15 characters

View the Description View the linked PRs

Description of problem:

An assisted-service fix https://issues.redhat.com//browse/MGMT-15340, resolved an issue in the nmstateconfig scripts to ensure VLAN names are < 15 characters. This same fix needs to be merged to the agent installer.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Create an agent image with static networking using a vlan with a long name (greater than 15 characters)
2. Boot a host with the agent image

Actual results:

 The installation will fail

Expected results:

The installation will pass.

Additional info:

https://github.com/openshift/installer/pull/7486

Bug OCPBUGS-18906: Remove dependency on k8s.io/kubernetes packages

View the Description View the linked PRs

Using packages from k8s.io/kubernetes is not supported: https://github.com/kubernetes/kubernetes/issues/79384#issuecomment-505627280

This came about in this slack thread: https://redhat-internal.slack.com/archives/C02CZNQHGN8/p1694210392218409?thread_ts=1694207119.447459&cid=C02CZNQHGN8

https://github.com/openshift/machine-config-operator/pull/3913

Task MON-3503: Synchronize versions of the downstream components

View the linked PRs

Bug OCPBUGS-23108: Should reference configmaps instead of secrets

View the Description View the linked PRs

Description of problem:

Code calls secrets instead of configmaps

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/csi-driver-shared-resource/pull/151

Bug OCPBUGS-10423: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3141

Bug OCPBUGS-24218: Scheduler TLS artifacts should have ownership annotations

View the linked PRs

https://github.com/openshift/cluster-kube-scheduler-operator/pull/511

Bug OCPBUGS-18762: Not all control plane components are returned to release image after controlPlaneRelease field is removed in the HostedCluster CR

View the Description View the linked PRs

Description of problem:

After control plane release upgrade, and controlPlaneRelease field is removed in the HostedCluster CR, only capi-provider, cluster-api and control-plane-operator are restarted and run release image, other components are not restarted and still run control plane release image

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1. create a cluster in 4.14-2023-09-06-180503
2. control plane release upgrade to 4.14-2023-09-07-180503
3. remove controlPlaneRelease in the HostedCluster CR
4. check all pods/containers images in the control plane namespace

Actual results:

only capi-provider, cluster-api and control-plane-operator are restarted and run release image 4.14-2023-09-06-180503, other components are not restarted and still run control plane release image 4.14-2023-09-07-180503.

jiezhao-mac:hypershift jiezhao$ oc get hostedcluster -n clusters NAME       VERSION                         KUBECONFIG                  PROGRESS    AVAILABLE   PROGRESSING   MESSAGE jie-test   4.14.0-0.ci-2023-09-06-180503   jie-test-admin-kubeconfig   Completed   True        False         The hosted control plane is available 

jiezhao-mac:hypershift jiezhao$
- lastTransitionTime: "2023-09-08T01:54:54Z"       
message: '[cluster-api deployment has 1 unavailable replicas, control-plane-operator         deployment has 1 unavailable replicas]'       
observedGeneration: 5       
reason: UnavailableReplicas       
status: "True"       
type: Degraded

Expected results:

The control plane should return to release image 4.14-2023-09-06-180503 with all components in a healthy state.

Additional info:

https://github.com/openshift/hypershift/pull/3004

Bug OCPBUGS-19269: Update 4.15 ose-machine-api-provider-azure image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-azure/pull/75

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-azure/pull/75

Bug OCPBUGS-26209: PipelineRuns details page get active on Task selection on logs page and logs page get empty on logs tab selection

View the Description View the linked PRs

This is a clone of issue OCPBUGS-25898. The following is the description of the original issue:
—
Description of problem:

PipelineRun logs page navigation is broken on navigate through the task on the PiplineRun log tab.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Navigate to PipelineRuns details page and select the Logs tab.
    2. Navigate through the tasks of the PipelineRun tasks

Actual results:

- Details tab gets active on selection of any task
- Logs page gets empty on seldction of Logs tab again
- Last task is not selected for completed PipelineRuns

Expected results:

- Logs tab should be active when user is not the Logs tab
- Last task should be selected in case of the completed PipelineRuns

Additional info:

  It is a regression after change in logic of tab selection in HorizontalNav component.

https://github.com/openshift/console/pull/13216/files#diff-267d61f330ad6cd9b0f2d743d9ff27929fbe7001780d73e1ec88599d3778eb96R177-R190

Video- https://drive.google.com/file/d/15fx9GWO2dRh4uaibRmZ4VTk4HFxQ7NId/view?usp=sharing

https://github.com/openshift/console/pull/13484

Bug OCPBUGS-20238: [OVN-Kubernetes] Incorret webhook error & exit handling

View the Description View the linked PRs

When there is an error on HTTP listen, webhook does not handle the error in a way that recovery is possible and instead hangs without printing anything useful on the logs.

Seen after this change https://issues.redhat.com//browse/OCPBUGS-20104 where the webhook was re-configured to run as non-root but listen would fail on upgrade as the old webhook instance was running as root which causes an error due to the SOREUSE socket option.

The webhook should crashloop instead which would provide a chance of recovery although the recovery itself might still be racey depending on whether k8s is able to kill the old webhook instance before noticing the crash of the new instance.

https://github.com/openshift/ovn-kubernetes/pull/1931

Bug OCPBUGS-17906: HyperShift guest cluster does not have cloudcredentials instance

View the Description View the linked PRs

Description of problem:

On Hypershift(Guest) cluster, EFS driver pod stuck at ContainerCreating state

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-11-055332

How reproducible:

Always

Steps to Reproduce:

1. Create Hypershift cluster.    
Flexy template: aos-4_14/ipi-on-aws/versioned-installer-ovn-hypershift-ci

2. Try to install EFS operator and driver from yaml file/web console as mentioned in below steps.  
a) Create iam role from ccoctl tool and will get ROLE ARN value from the output   
b) Install EFS operator using the above ROLE ARN value.   
c) Check EFS operator, node, controller pods are up and running  

// og-sub-hcp.yaml
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  generateName: openshift-cluster-csi-drivers-
  namespace: openshift-cluster-csi-drivers
spec:
  namespaces:
  - ""
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: aws-efs-csi-driver-operator
  namespace: openshift-cluster-csi-drivers
spec:
    channel: stable
    name: aws-efs-csi-driver-operator
    source: qe-app-registry
    sourceNamespace: openshift-marketplace
    config:
      env:
      - name: ROLEARN
        value: arn:aws:iam::301721915996:role/hypershift-ci-16666-openshift-cluster-csi-drivers-aws-efs-cloud-

// driver.yaml
apiVersion: operator.openshift.io/v1
kind: ClusterCSIDriver
metadata:
  name: efs.csi.aws.com
spec:
  logLevel: TraceAll
  managementState: Managed
  operatorLogLevel: TraceAll

Actual results:

aws-efs-csi-driver-controller-699664644f-dkfdk   0/4     ContainerCreating   0          87m

Expected results:

EFS controller pods should be up and running

Additional info:

oc -n openshift-cluster-csi-drivers logs aws-efs-csi-driver-operator-6758c5dc46-b75hb

E0821 08:51:25.160599       1 base_controller.go:266] "AWSEFSDriverCredentialsRequestController" controller failed to sync "key", err: cloudcredential.operator.openshift.io "cluster" not found

Discussion: https://redhat-internal.slack.com/archives/GK0DA0JR5/p1692606247221239
Installation steps epic: https://issues.redhat.com/browse/STOR-1421

https://github.com/openshift/hypershift/pull/3009

Bug OCPBUGS-21777: BMH keep showing power status as off while IMM is powered on

View the Description View the linked PRs

Description of problem:

BMH is showing powered off even when node is up, this is causing cu's software to behave incorrectly due to incorrect status on BMH 

$ oc get bmh -n openshift-machine-api control-1-ru2 -o json | jq '.status|.operationalStatus,.poweredOn,.provisioning.state'
"OK"
false
"externally provisioned"


Following error can be seen:
2023-10-10T06:05:02.554453960Z {"level":"info","ts":1696917902.5544183,"logger":"provisioner.ironic","msg":"could not update node settings in ironic, busy","host":"openshift-machine-api~control-1-ru4"}

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1.Launch the cluster with OCP v4.12.32 on Lenovo servers  
2.
3.

Actual results:

It is giving false report of node status

Expected results:

It should report correct status of node

Additional info:

Bug OCPBUGS-25835: Installer should validate that 'baremetal' capability is enabled for baremetal platform

View the Description View the linked PRs

If the user specifies baselineCapabilitySet: None in the install-config and does not specifically enable the capability baremetal, yet still uses platform: baremetal then the install will reliably fail.

This failure takes the form of a timeout with the bootkube logs (not easily accessible to the user) full of errors like:

bootkube.sh[46065]: "99_baremetal-provisioning-config.yaml": unable to get REST mapping for "99_baremetal-provisioning-config.yaml": no matches for kind "Provisioning" in version "metal3.io/v1alpha1"
bootkube.sh[46065]: "99_openshift-cluster-api_hosts-0.yaml": unable to get REST mapping for "99_openshift-cluster-api_hosts-0.yaml": no matches for kind "BareMetalHost" in version "metal3.io/v1alpha1"

Since the installer can tell when processing the install-config if the baremetal capability is missing, we should detect this and error out immediately to save the user an hour of their life and us a support case.

Although this was found on an agent install, I believe the same will apply to a baremetal IPI install.

https://github.com/openshift/installer/pull/7901

Bug OCPBUGS-25355: setting TLSSecurityProfile with no minTLSVersion crashes controller

View the Description View the linked PRs

This is a clone of issue OCPBUGS-24226. The following is the description of the original issue:
—
Maxim Patlasov pointed this out in ~~STOR-1453~~ but still somehow we missed it. I tested this on 4.15.0-0.ci-2023-11-29-021749.

It is possible to set a custom TLSSecurityProfile without minTLSversion:

$ oc edit apiserver cluster
...
spec:
tlsSecurityProfile:
type: Custom
custom:
ciphers:
- ECDHE-ECDSA-CHACHA20-POLY1305
- ECDHE-ECDSA-AES128-GCM-SHA256

This causes the controller to crash loop:

$ oc get pods -n openshift-cluster-csi-drivers
NAME READY STATUS RESTARTS AGE
aws-ebs-csi-driver-controller-589c44468b-gjrs2 6/11 CrashLoopBackOff 10 (18s ago) 37s
...

because the `${TLS_MIN_VERSION}` placeholder is never replaced:

- --tls-min-version=${TLS_MIN_VERSION}
- --tls-min-version=${TLS_MIN_VERSION}
- --tls-min-version=${TLS_MIN_VERSION}
- --tls-min-version=${TLS_MIN_VERSION}
- --tls-min-version=${TLS_MIN_VERSION}

The observed config in the ClusterCSIDriver shows an empty string:

$ oc get clustercsidriver ebs.csi.aws.com -o json | jq .spec.observedConfig
{
"targetcsiconfig": {
"servingInfo":

{ "cipherSuites": [ "TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256", "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256" ], "minTLSVersion": "" }

}
}

which means minTLSVersion is empty when we get to this line, and the string replacement is not done:

[https://github.com/openshift/library-go/blob/c7f15dcc10f5d0b89e8f4c5d50cd313ae158de20/pkg/operator/csi/csidrivercontrollerservicecontroller/helpers.go#L234]

So it seems we have a couple of options:

1) completely omit the --tls-min-version arg if minTLSVersion is empty, or
2) set --tls-min-version to the same default value we would use if TLSSecurityProfile is not present in the apiserver object

Bug OCPBUGS-20511: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/platform-operators/pull/92

Bug OCPBUGS-24091: Update 4.15 ose-azure-cluster-api-controllers-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-azure/pull/291

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-azure/pull/291

Bug MGMT-15971: local-agent-cluster-cluster-deployment displayed as Detached

View the Description View the linked PRs

Description of the problem:

Right after installation, hub cluster indicated two clusters:

local-agent-cluster-cluster-deployment
local-cluster

Status of local-agent-cluster-cluster-deployment is Detached.

Also there is no information about Labels, Nodes and Add-ons.

How reproducible:

100%

Steps to reproduce:

1. Deploy OCP 4.14 x86_64

2. Open cluster management console

3. Open All clusters view

Actual results:

Status of local-agent-cluster-cluster-deployment is Detached.

Expected results:

Status of local-agent-cluster-cluster-deployment is Ready.

https://github.com/openshift/assisted-service/pull/5575

Bug OCPBUGS-24105: Update 4.15 prometheus-config-reloader-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prometheus-operator/pull/259

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/prometheus-operator/pull/259

Bug OCPBUGS-24156: Update 4.15 ose-cluster-kube-storage-version-migrator-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-kube-storage-version-migrator-operator/pull/100

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-kube-storage-version-migrator-operator/pull/100

Bug OCPBUGS-18754: tuned pod in the guest cluster uses control plane release image after controlplane release upgrade

View the Description View the linked PRs

Description of problem:

After control plane release upgrade, in the guest cluster pod 'tuned' uses control plane release image

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1. create a cluster in 4.14.0-0.ci-2023-09-06-180503
2. control plane release upgrade to 4.14-2023-09-07-180503
3. in the guest cluster check container image in pod tuned

Actual results:

pod tuned uses control plane release image 4.14-2023-09-07-180503

Expected results:

pod tuned uses release image 4.14.0-0.ci-2023-09-06-180503

Additional info:

After controlplane release upgrade, in control plane namespace, cluster-node-tuning-operator uses control plane release image:

jiezhao-mac:hypershift jiezhao$ oc get pods cluster-node-tuning-operator-6dc549ffdf-jhj2k -n clusters-jie-test -ojsonpath='{.spec.containers[].name}{"\n"}'
cluster-node-tuning-operator
jiezhao-mac:hypershift jiezhao$ oc get pods cluster-node-tuning-operator-6dc549ffdf-jhj2k -n clusters-jie-test -ojsonpath='{.spec.containers[].image}{"\n"}'
registry.ci.openshift.org/ocp/4.14-2023-09-07-180503@sha256:60bd6e2e8db761fb4b3b9d68c1da16bf0371343e3df8e72e12a2502640173990

https://github.com/openshift/hypershift/pull/3003

Bug OCPBUGS-23759: Ironic side of external_http_url (METAL-163) is not wired in correctly

View the Description View the linked PRs

Description of problem:

In the implementation of ~~METAL-163~~, the support for the new Ironic Node field external_http_url was only added for floppy-based configuration images, not for CD images that we use in OpenShift. This makes external_http_url a no-op.

See https://review.opendev.org/c/openstack/ironic/+/901696

https://github.com/openshift/ironic-image/pull/432

Bug OCPBUGS-20181: unit test job failure rates are high in oc

View the Description View the linked PRs

Description of problem:

unit test failures rates are high https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-oc-master-unit

TestNewAppRunAll/emptyDir_volumes is failing

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_oc/1557/pull-ci-openshift-oc-master-unit/1710206848667226112

Version-Release number of selected component (if applicable):

How reproducible:

Run local or in CI and see that unit test job is failing

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/oc/pull/1559

Bug OCPBUGS-24090: Update 4.15 ose-openshift-apiserver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/openshift-apiserver/pull/406

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/openshift-apiserver/pull/406

Bug OCPBUGS-21670: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-olm-operator/pull/32

Bug OCPBUGS-18995: SDN: 4.14 after ec4 has a higher pod ready latency compared to 4.13.10

View the Description View the linked PRs

Description of problem:

This is to track the SDN specific issue in https://issues.redhat.com/browse/OCPBUGS-18389

4.14 nightly has a higher pod ready latency compared to 4.14 ec4 and 4.13.z in node-density (lite) test

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-09-11-201102

How reproducible:

Everytime

Steps to Reproduce:

1. Install a SDN cluster and scale up to 24 worker nodes, install 3 infra nodes and move monitoring, ingress, registry components to infra nodes. 
2. Run node-density (lite) test with 245 pod per node
3. Compare the pod ready latency to 4.13.z, and 4.14 ec4

Actual results:

4.14 nightly has a higher pod ready latency compared to 4.14 ec4 and 4.13.10

Expected results:

4.14 should have similar pod ready latency compared to previous release

Additional info:

OCP Version	Flexy Id	Scale Ci Job	Grafana URL	Cloud	Arch Type	Network Type	Worker Count	PODS_PER_NODE	Avg Pod Ready (ms)	P99 Pod Ready (ms)	Must-gather
4.14.0-ec.4	231559	292	087eb40c-6600-4db3-a9fd-3b959f4a434a	aws	amd64	SDN	24	245	2186	3256	https://drive.google.com/file/d/1NInCiai7WWIIVT8uL-5KKeQl9CtQN_Ck/view?usp=drive_link
4.14.0-0.nightly-2023-09-02-132842	231558	291	62404e34-672e-4168-b4cc-0bd575768aad	aws	amd64	SDN	24	245	58725	294279	https://drive.google.com/file/d/1BbVeNrWzVdogFhYihNfv-99_q8oj6eCN/view?usp=drive_link

With the new multus image provided by Dan Williams in https://issues.redhat.com/browse/OCPBUGS-18389, SDN 24 nodes's latency is similar to without the fix.

% oc -n openshift-network-operator get deployment.apps/network-operator -o yaml | grep MULTUS_IMAGE -A 1
        - name: MULTUS_IMAGE
          value: quay.io/dcbw/multus-cni:informer 
 % oc get pod -n openshift-multus -o yaml | grep image: | grep multus
      image: quay.io/dcbw/multus-cni:informer
....

OCP Version	Flexy Id	Scale Ci Job	Grafana URL	Cloud	Arch Type	Network Type	Worker Count	PODS_PER_NODE	Avg Pod Ready (ms)	P99 Pod Ready (ms)	Must-gather
4.14.0-0.nightly-2023-09-11-201102 quay.io/dcbw/multus-cni:informer	232389	314	f2c290c1-73ea-4f10-a797-3ab9d45e94b3	aws	amd64	SDN	24	245	61234	311776	https://drive.google.com/file/d/1o7JXJAd_V3Fzw81pTaLXQn1ms44lX6v5/view?usp=drive_link
4.14.0-ec.4	231559	292	087eb40c-6600-4db3-a9fd-3b959f4a434a	aws	amd64	SDN	24	245	2186	3256	https://drive.google.com/file/d/1NInCiai7WWIIVT8uL-5KKeQl9CtQN_Ck/view?usp=drive_link
4.14.0-0.nightly-2023-09-02-132842	231558	291	62404e34-672e-4168-b4cc-0bd575768aad	aws	amd64	SDN	24	245	58725	294279	https://drive.google.com/file/d/1BbVeNrWzVdogFhYihNfv-99_q8oj6eCN/view?usp=drive_link

Zenghui Shi Peng Liu request to modify the multus-daemon-config ConfigMap by removing readinessindicatorfile flag

scale down CNO deployment to 0
edit configmap to remove 80-openshift-network.conf (sdn) or 10-ovn-kubernetes.conf (ovn-k)
restart (delete) multus pod on each worker

Steps:

oc scale --replicas=0 -n openshift-network-operator deployments network-operator
oc edit cm multus-daemon-config -n openshift-multus, and remove the line "readinessindicatorfile": "/host/run/multus/cni/net.d/80-openshift-network.conf",
oc get po ~~n openshift-multus | grep multus~~ | egrep -v "multus-additional|multus-admission" | awk '{print $1}' | xargs oc delete po -n openshift-multus

Now the readinessindicatorfile flag is removed and And all multus pods are restarted

% oc get cm multus-daemon-config -n openshift-multus -o yaml | grep readinessindicatorfile -c
0

Test Result: p99 is better compared to without the fix(remove readinessindicatorfile) but is stall worse than ec4, avg is still bad.

OCP Version	Flexy Id	Scale Ci Job	Grafana URL	Cloud	Arch Type	Network Type	Worker Count	PODS_PER_NODE	Avg Pod Ready (ms)	P99 Pod Ready (ms)	Must-gather
4.14.0-0.nightly-2023-09-11-201102 quay.io/dcbw/multus-cni:informer and remove `readinessindicatorfile` flag	232389	316	d7a754aa-4f52-49eb-80cf-907bee38a81b	aws	amd64	SDN	24	245	51775	105296	https://drive.google.com/file/d/1h-3JeZXQRO-zsgWzen6aNDQfSDqoKAs2/view?usp=drive_link

Zenghui Shi Peng Liu request to set logLever to debug in additional to removing readinessindicatorfile flag

edit the cm to set "logLevel": "verbose" -> "debug" and restart all multus pods

Now the logLever is debug and And all multus pods are restarted

% oc get cm multus-daemon-config -n openshift-multus -o yaml | grep logLevel
        "logLevel": "debug",
% oc get cm multus-daemon-config -n openshift-multus -o yaml | grep readinessindicatorfile -c
0

OCP Version	Flexy Id	Scale Ci Job	Grafana URL	Cloud	Arch Type	Network Type	Worker Count	PODS_PER_NODE	Avg Pod Ready (ms)	P99 Pod Ready (ms)	Must-gather
4.14.0-0.nightly-2023-09-11-201102 quay.io/dcbw/multus-cni:informer and remove `readinessindicatorfile` flag and logLevel=debug	232389	320	5d1d3e6a-bfa1-4a4b-bbfc-daedc5605f7d	aws	amd64	SDN	24	245	49586	105314	https://drive.google.com/file/d/1p1PDbnqm0NlWND-komc9jbQ1PyQMeWcV/view?usp=drive_link

Edit

https://github.com/openshift/multus-cni/pull/186

Bug OCPBUGS-20210: Invalid egressIP object caused ovnkube-node pods CLBO

View the Description View the linked PRs

Description of problem:

Invalid egressIP object caused ovnkube-node pods CLBO

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-05-195247

How reproducible:

Always

Steps to Reproduce:

1. Label one node as egress node
2. Created an egressIP object, with empty label key and value
oc get egressip -o yaml
apiVersion: v1
items:
- apiVersion: k8s.ovn.org/v1
  kind: EgressIP
  metadata:
    creationTimestamp: "2023-10-07T09:08:28Z"
    generation: 2
    name: egressip-test
    resourceVersion: "122021"
    uid: 23445450-37d5-4ec3-b8fe-d8352a19e703
  spec:
    egressIPs:
    - 10.0.70.100
    namespaceSelector:
      matchLabels:
        "": ""
    podSelector:
      matchLabels:
        "": ""
  status:
    items:
    - egressIP: 10.0.70.100
      node: ip-10-0-70-135
kind: List
metadata:
  resourceVersion: ""

3. Created namespace and test pods

Actual results:

Test pods was stuck in ContainerCreating status  
% oc get pods -n hrw
NAME            READY   STATUS              RESTARTS   AGE
test-rc-hwmns   0/1     ContainerCreating   0          45s
test-rc-p9kl8   0/1     ContainerCreating   0          45s
 % oc describe pod test-rc-hwmns   -n hrw
Name:             test-rc-hwmns
Namespace:        hrw
Priority:         0
Service Account:  default
Node:             ip-10-0-70-125/10.0.70.125
Start Time:       Sat, 07 Oct 2023 17:08:50 +0800
Labels:           name=test-pods
Annotations:      k8s.ovn.org/pod-networks:
                    {"default":{"ip_addresses":["10.129.2.11/23"],"mac_address":"0a:58:0a:81:02:0b","gateway_ips":["10.129.2.1"],"routes":[{"dest":"10.128.0.0...
                  openshift.io/scc: restricted-v2
                  seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status:           Pending
IP:               
IPs:              <none>
Controlled By:    ReplicationController/test-rc
Containers:
  test-pod:
    Container ID:   
    Image:          quay.io/openshifttest/hello-sdn@sha256:c89445416459e7adea9a5a416b3365ed3d74f2491beb904d61dc8d1eb89a72a4
    Image ID:       
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      memory:  340Mi
    Requests:
      memory:     340Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-7vlz8 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-7vlz8:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
    ConfigMapName:           openshift-service-ca.crt
    ConfigMapOptional:       <nil>
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age   From               Message
  ----     ------                  ----  ----               -------
  Normal   Scheduled               59s   default-scheduler  Successfully assigned hrw/test-rc-hwmns to ip-10-0-70-125
  Warning  FailedCreatePodSandBox  59s   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_test-rc-hwmns_hrw_d72a4216-b94b-4034-a9f7-526758055994_0(1ad74472b9e985cee4a3081f5912b3d4553351d14764d3bfece1d174146f90ca): error adding pod hrw_test-rc-hwmns to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: '&{ContainerID:1ad74472b9e985cee4a3081f5912b3d4553351d14764d3bfece1d174146f90ca Netns:/var/run/netns/131f3670-1a49-4088-9002-5624a3acc6d3 IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=hrw;K8S_POD_NAME=test-rc-hwmns;K8S_POD_INFRA_CONTAINER_ID=1ad74472b9e985cee4a3081f5912b3d4553351d14764d3bfece1d174146f90ca;K8S_POD_UID=d72a4216-b94b-4034-a9f7-526758055994 Path: StdinData:[123 34 98 105 110 68 105 114 34 58 34 47 118 97 114 47 108 105 98 47 99 110 105 47 98 105 110 34 44 34 99 104 114 111 111 116 68 105 114 34 58 34 47 104 111 115 116 114 111 111 116 34 44 34 99 108 117 115 116 101 114 78 101 116 119 111 114 107 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 99 110 105 47 110 101 116 46 100 47 49 48 45 111 118 110 45 107 117 98 101 114 110 101 116 101 115 46 99 111 110 102 34 44 34 99 110 105 67 111 110 102 105 103 68 105 114 34 58 34 47 104 111 115 116 47 101 116 99 47 99 110 105 47 110 101 116 46 100 34 44 34 99 110 105 86 101 114 115 105 111 110 34 58 34 48 46 51 46 49 34 44 34 100 97 101 109 111 110 83 111 99 107 101 116 68 105 114 34 58 34 47 114 117 110 47 109 117 108 116 117 115 47 115 111 99 107 101 116 34 44 34 103 108 111 98 97 108 78 97 109 101 115 112 97 99 101 115 34 58 34 100 101 102 97 117 108 116 44 111 112 101 110 115 104 105 102 116 45 109 117 108 116 117 115 44 111 112 101 110 115 104 105 102 116 45 115 114 105 111 118 45 110 101 116 119 111 114 107 45 111 112 101 114 97 116 111 114 34 44 34 108 111 103 76 101 118 101 108 34 58 34 118 101 114 98 111 115 101 34 44 34 108 111 103 84 111 83 116 100 101 114 114 34 58 116 114 117 101 44 34 109 117 108 116 117 115 65 117 116 111 99 111 110 102 105 103 68 105 114 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 99 110 105 47 110 101 116 46 100 34 44 34 109 117 108 116 117 115 67 111 110 102 105 103 70 105 108 101 34 58 34 97 117 116 111 34 44 34 110 97 109 101 34 58 34 109 117 108 116 117 115 45 99 110 105 45 110 101 116 119 111 114 107 34 44 34 110 97 109 101 115 112 97 99 101 73 115 111 108 97 116 105 111 110 34 58 116 114 117 101 44 34 112 101 114 78 111 100 101 67 101 114 116 105 102 105 99 97 116 101 34 58 123 34 98 111 111 116 115 116 114 97 112 75 117 98 101 99 111 110 102 105 103 34 58 34 47 104 111 115 116 114 111 111 116 47 118 97 114 47 108 105 98 47 107 117 98 101 108 101 116 47 107 117 98 101 99 111 110 102 105 103 34 44 34 99 101 114 116 68 105 114 34 58 34 47 101 116 99 47 99 110 105 47 109 117 108 116 117 115 47 99 101 114 116 115 34 44 34 99 101 114 116 68 117 114 97 116 105 111 110 34 58 34 50 52 104 34 44 34 101 110 97 98 108 101 100 34 58 116 114 117 101 125 44 34 115 111 99 107 101 116 68 105 114 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 115 111 99 107 101 116 34 44 34 116 121 112 101 34 58 34 109 117 108 116 117 115 45 115 104 105 109 34 125]} ContainerID:"1ad74472b9e985cee4a3081f5912b3d4553351d14764d3bfece1d174146f90ca" Netns:"/var/run/netns/131f3670-1a49-4088-9002-5624a3acc6d3" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=hrw;K8S_POD_NAME=test-rc-hwmns;K8S_POD_INFRA_CONTAINER_ID=1ad74472b9e985cee4a3081f5912b3d4553351d14764d3bfece1d174146f90ca;K8S_POD_UID=d72a4216-b94b-4034-a9f7-526758055994" Path:"" ERRORED: error configuring pod [hrw/test-rc-hwmns] networking: [hrw/test-rc-hwmns/d72a4216-b94b-4034-a9f7-526758055994:ovn-kubernetes]: error adding container to network "ovn-kubernetes": failed to send CNI request: Post "http://dummy/": dial unix /var/run/ovn-kubernetes/cni//ovn-cni-server.sock: connect: connection refused
'
 
% oc get pods -n openshift-ovn-kubernetes                         
NAME                                     READY   STATUS             RESTARTS        AGE
ovnkube-control-plane-85f96b444b-2bdwf   2/2     Running            0               5h27m
ovnkube-control-plane-85f96b444b-2mhfj   2/2     Running            0               5h27m
ovnkube-control-plane-85f96b444b-ddjhx   2/2     Running            0               5h27m
ovnkube-node-5fkb5                       7/8     CrashLoopBackOff   6 (2m52s ago)   13m
ovnkube-node-p7qvr                       7/8     CrashLoopBackOff   6 (2m56s ago)   13m
ovnkube-node-tzhlb                       7/8     CrashLoopBackOff   6 (2m51s ago)   13m
ovnkube-node-x5849                       7/8     CrashLoopBackOff   6 (2m57s ago)   13m
ovnkube-node-xscbr                       7/8     CrashLoopBackOff   6 (2m35s ago)   13m

    exec /usr/bin/ovnkube --init-ovnkube-controller "${K8S_NODE}" --init-node "${K8S_NODE}" \
        --config-file=/run/ovnkube-config/ovnkube.conf \
        --ovn-empty-lb-events \
        --loglevel "${OVN_KUBE_LOG_LEVEL}" \
        --inactivity-probe="${OVN_CONTROLLER_INACTIVITY_PROBE}" \
        ${gateway_mode_flags} \
        ${node_mgmt_port_netdev_flags} \
        --metrics-bind-address "127.0.0.1:29103" \
        --ovn-metrics-bind-address "127.0.0.1:29105" \
        --metrics-enable-pprof \
        --metrics-enable-config-duration \
        --export-ovs-metrics \
        --disable-snat-multiple-gws \
        ${export_network_flows_flags} \
        ${multi_network_enabled_flag} \
        ${multi_network_policy_enabled_flag} \
        ${admin_network_policy_enabled_flag} \
        --enable-multicast \
        --zone ${K8S_NODE} \
        --enable-interconnect \
        --acl-logging-rate-limit "20" \
        ${gw_interface_flag} \
        --enable-multi-external-gateway=true \
        ${ip_forwarding_flag} \
        ${NETWORK_NODE_IDENTITY_ENABLE}
      
    State:       Waiting
      Reason:    CrashLoopBackOff
    Last State:  Terminated
      Reason:    Error
      Message:   vn-kubernetes/go-controller/pkg/retry.(*RetryFramework).WatchResourceFiltered.func1.1({0xc0007cb368, 0x11})
                 /go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/retry/obj_retry.go:531 +0x2c7
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/retry.(*RetryFramework).DoWithLock(0xc000d4eb40, {0xc0007cb368, 0x11}, 0xc000e43dd0)
  /go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/retry/obj_retry.go:137 +0xce
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/retry.(*RetryFramework).WatchResourceFiltered.func1({0x22eede0, 0xc000c6fec0})
  /go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/retry/obj_retry.go:504 +0x265
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd(...)
  /go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/tools/cache/controller.go:243
k8s.io/client-go/tools/cache.FilteringResourceEventHandler.OnAdd({0xc00111bdc0?, {0x26d0aa0?, 0xc001580570?}}, {0x22eede0, 0xc000c6fec0}, 0xa0?)
  /go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/tools/cache/controller.go:306 +0x6e
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.(*Handler).OnAdd(...)
  /go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/factory/handler.go:52
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.newQueuedInformer.func1.1(0xc000e43da0?)

      Exit Code:    2
      Started:      Sat, 07 Oct 2023 17:14:38 +0800
      Finished:     Sat, 07 Oct 2023 17:14:39 +0800
    Ready:          False
    Restart Count:  6
    Requests:
      cpu:      10m
      memory:   600Mi

Expected results:

Add some checking point about labels ? Give the warning that the key should not be empty and not able to apply?

Additional info:

https://github.com/openshift/ovn-kubernetes/pull/1942

Bug OCPBUGS-23290: CannotRetrieveUpdates should provide command-line next-step advice

View the Description View the linked PRs

Description of problem:

The CannotRetrieveUpdates alert currently provides a link to the web-console so the responding admin can find the RetrievedUpdates=False message. But some admins lack convenient console access (e.g. they're SSHing in to a restricted network or the cluster does not have the Console capability enabled. Those admins would benefit from oc ... command-line advice.

Version-Release number of selected component (if applicable):

The alert is new in 4.6:

$ for Y in $(seq 5 12); do git --no-pager grep CannotRetrieveUpdates "origin/release-4.${Y}"; done | head -n1
origin/release-4.6:docs/user/status.md:When CVO is unable to retrieve recommended updates the CannotRetrieveUpdates alert will fire containing the reason. This alert will not fire when the reason updates cannot be retrieved is NoChannel.

and has never provided command-line advice.

How reproducible:

Consistently.

Steps to Reproduce:

1. Install a cluster.
2. Set an impossible channel, such as oc adm upgrade channel testing.
3. Wait an hour.
4. Check firing alerts in /monitoring/alerts.
5. Click through to CannotRetrieveUpdates.

Actual results:

Failure to retrieve updates means that cluster administrators...

description does not provide oc ... advice.

Expected results:

Failure to retrieve updates means that cluster administrators...

description does provide oc ... advice.

https://github.com/openshift/cluster-version-operator/pull/995

Bug OCPBUGS-23342: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/191

Bug OCPBUGS-18485: dev console, silence alert, alert state is changed from Silenced to Firing quickly

View the Description View the linked PRs

Description of problem:

developer console, go to "Observe -> openshift-moniotring -> Alerts", silence Watchdog alert, at the first, the alert state is Silenced in Alerts tab, but changed to Firing quickly(the alert is silenced actually), see the attached screen shoot

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-09-02-132842

How reproducible:

always

Steps to Reproduce:

1. silence alert in the dev console, and check alert state in Alerts tab
2.
3.

Actual results:

alert state is changed from Silenced to Firing quickly

Expected results:

state should be Silenced

https://github.com/openshift/console/pull/13151

Bug OCPBUGS-20519: hosted cluster upgrade failure from 4.13 stable to 4.14 nightly

View the Description View the linked PRs

This is just a placeholder bug in 4.15.
the original bug ( https://issues.redhat.com/browse/OCPBUGS-20472 ) does not exist in 4.15 release.

===

Description of problem:

prow CI job: periodic-ci-openshift-openshift-tests-private-release-4.14-amd64-nightly-4.14-upgrade-from-stable-4.13-aws-ipi-ovn-hypershift-replace-f7 failed in the step of upgrading the HCP image of the hosted cluster.

one failed job link: https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/pr-logs/pull/opens[…]-hypershift-replace-f7/1712338041915314176

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

* retrigger/rehearsal the job
or 
* create a 4.13 stable hosted cluster and upgrade it to 4.14 nightly manually

Actual results:

the upgrade failed using 4.14 nightly image for `hostedcluster`

Expected results:

upgrade for hostedcluster/nodepool successfully

Additional info:

we could get dump file from the job artifacts

https://github.com/openshift/cluster-network-operator/pull/2065

Bug OCPBUGS-25768: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-disk-csi-driver/pull/72

Bug OCPBUGS-12707: Master MCP is degraded because of MC not found

View the Description View the linked PRs

Description of problem:


When we deploy a cluster in AWS using this template https://gitlab.cee.redhat.com/aosqe/flexy-templates/-/blob/master/functionality-testing/aos-4_14/ipi-on-aws/versioned-installer-customer_vpc-disconnected_private_cluster-sts-private-s3-custom_endpoints-ci master MCP is degraded and reports this error:

  - lastTransitionTime: "2023-04-25T07:48:45Z"
    message: 'Node ip-10-0-55-111.us-east-2.compute.internal is reporting: "machineconfig.machineconfiguration.openshift.io
      \"rendered-master-8ef3f9cb45adb7bbe5f819eb831ffd7d\" not found", Node ip-10-0-60-138.us-east-2.compute.internal
      is reporting: "machineconfig.machineconfiguration.openshift.io \"rendered-master-8ef3f9cb45adb7bbe5f819eb831ffd7d\"
      not found", Node ip-10-0-69-137.us-east-2.compute.internal is reporting: "machineconfig.machineconfiguration.openshift.io
      \"rendered-master-8ef3f9cb45adb7bbe5f819eb831ffd7d\" not found"'
    reason: 3 nodes are reporting degraded status on sync
    status: "True"
    type: NodeDegraded

Version-Release number of selected component (if applicable):

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       False         3h12m   Error while reconciling 4.14.0-0.nightly-2023-04-19-125337: the cluster operator machine-config is degraded

How reproducible:

2 out of 2.

Steps to Reproduce:

1. Install OCP using this template https://gitlab.cee.redhat.com/aosqe/flexy-templates/-/blob/master/functionality-testing/aos-4_14/ipi-on-aws/versioned-installer-customer_vpc-disconnected_private_cluster-sts-private-s3-custom_endpoints-ci

We can see examples of this installation here:
https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/198964/

and here:
https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/199028/


Builds have been marked as keep forever, but just in case, the parameters are:

INSTANCE_NAME_PREFIX: Your ID, any short string just make it sure it is unit.
VARIABLES_LOCATION: private-templates/functionality-testing/aos-4_14/ipi-on-aws/versioned-installer-customer_vpc-disconnected_private_cluster-sts-private-s3-custom_endpoints-ci
LAUNCHER_VARS: <leave empty>
BUSHSLICER_CONFIG:<leave emtpy>

Actual results:


The installation failed reporting a degrade master MCP

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       False         3h12m   Error while reconciling 4.14.0-0.nightly-2023-04-19-125337: the cluster operator machine-config is degraded

$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master                                                      False     True       True       3              0                   0                     3                      4h21m
worker   rendered-worker-166729d2617b1b63cf5d9bb818dd9cf8   True      False      False      3              3                   3                     0                      4h21m

Expected results:

Installation should finish without problems and no MCP should be degraded

Additional info:

Must gather linked in the first comment

https://github.com/openshift/installer/pull/7514

Bug OCPBUGS-22600: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/gcp-pd-csi-driver-operator/pull/94

Bug OCPBUGS-23292: Webpack-DevServer Hot-Reload Not Working

View the Description View the linked PRs

Description of problem:

Webpack-DevServer Hot-Reload Not Working due to recent update to nodejsv18

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13331

Bug OCPBUGS-21876: pipe can hide errors when using ip command

View the Description View the linked PRs

Description of problem:

if pipefail is active in a bash script, the pipe ( | ) usage can hide the actual error of the ip command if it fails with exit code different from 1

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/ironic-image/pull/408

Bug OCPBUGS-23780: After PatternFly5 update: Pod status ring is missing in topology graph view

View the Description View the linked PRs

Issue 53 from https://docs.google.com/spreadsheets/d/1TR3ENY-GE_LQL9F-xH6NahtRHdu6IrYGhLqDDA8y0EI/edit#gid=1035185624

Topology > Pod rings are missing for Deployments

Screenshot: https://drive.google.com/file/d/1RXCMKjvu2mdO2tQeHe-p5mLbINfmP5u4/view?usp=drive_link

https://github.com/openshift/console/pull/13376

Bug OCPBUGS-8512: WebhookConfiguration caBundle injection is incorrect when some webhooks already confiugred

View the Description View the linked PRs

Description of problem:

WebhookConfiguration caBundle injection is incorrect when some webhooks already configured with caBundle.

Behavior seems to be that the first n number of webhooks in `.webhooks` array have caBundle injected, where n is the number of webhooks that do not have caBundle set.

Version-Release number of selected component (if applicable):

How reproducible

Steps to Reproduce:

1. Create a validatingwebhookconfigurations or mutatingwebhookconfigurations with `service.beta.openshift.io/inject-cabundle: "true"` annotation.

2. oc edit validatingwebhookconfigurations (or oc edit mutatingwebhookconfigurations)

3. Add a new webhook to the end of the list `.webhooks`. It will not have caBundle set manually as service-ca should inject it. 

4. Observe new webhook does not get caBundle injected.

Note: it is important in step. 3 that the new webhook is added to the end of the list.

Actual results:

Only the first n webhooks have caBundle injected where n is the number of webhooks without caBundle set.

Expected results:

All webhooks have caBundle injected when they do not have it set.

Additional info:

Open PR here: https://github.com/openshift/service-ca-operator/pull/207

The issue seems to be a mistake with go-lang for range syntax where "i" is the index of desired "i" to update.  

tl dr; code should update the value of the int in the array, not the index of the int in the array.

https://github.com/openshift/service-ca-operator/pull/219

Task HOSTEDCP-1206: Create e2e test for request serving isolation mode

View the Description View the linked PRs

We are going to be using request serving isolation mode in ROSA. We need an e2e test that helps us to not break that function as we continue HyperShift development.

https://github.com/openshift/hypershift/pull/3150

Bug OCPBUGS-19265: Update 4.15 ose-azure-cloud-node-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-azure/pull/85

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-azure/pull/85

Bug MON-3551: Fix jq command in local cmo run

View the Description View the linked PRs

Fix jq command in local cmo run

https://github.com/openshift/cluster-monitoring-operator/pull/2180

Bug OCPBUGS-19109: Update 4.15 ose-vsphere-cluster-api-controllers image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-vsphere/pull/17

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-19190: Update 4.15 ose-machine-api-provider-aws image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-aws/pull/82

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-aws/pull/82

Bug OCPBUGS-19458: console PR 13114 makes many functions under "Observe > Metrics" unavailable

View the Description View the linked PRs

Description of problem:

tested https://github.com/openshift/console/pull/13114 with cluster-bot

launch 4.15,openshift/console#13114 gcp

the below functions are unavailable, see recording: https://drive.google.com/file/d/1yBS_xGWgJwfIoOdLdIjZ6riSL_cOARrb/view?usp=sharing

1. time interval drop-down
2. Actions drop-down:
Add query
Collapse all query tables

3. Add query button
4. kebab menu:
Disable query
Delete query
Duplicate query

5. disable/enable query toggle button

NOTE: also checked on 4.15.0-0.nightly-arm64-2023-09-19-235618, no such issues

Version-Release number of selected component (if applicable):

test https://github.com/openshift/console/pull/13114 with cluster-bot

How reproducible:

always

Steps to Reproduce:

1. regression testing for console PR 13114
2.
3.

Actual results:

console PR 13114 makes many functions under "Observe > Metrics" unavailable

Expected results:

no issue

https://github.com/openshift/console/pull/13114

Bug OCPBUGS-24001: Pipeline Builder crashes after a Task was installed from ArtifactHub

View the Description View the linked PRs

After I installed a "Git" Task from ArtifactHub directly the Pipelines Builder and searched for a "git" Task again the Pipeline Builder crashes.

Steps to reproduce:

Install Pipelines operator
Navigate to Developer perspective > Pipelines
Press on Create to open the Pipeline Builder
Click on "Add task"
Search for "git"
Navigate down to an entry called "git" from the ArtifactHub and press enter to install it.
This automatically imports this Task below into the current project. You can also apply that yaml to reproduce this bug.
Click on "Add task" again
Search for "git"
Navigate down to the different git tasks.

Actual behaviour
Page crashes

Expected behaviour
Page should not crash

Additional information
Created/Imported Task:

apiVersion: tekton.dev/v1
kind: Task
metadata:
  annotations:
    openshift.io/installed-from: ArtifactHub
    tekton.dev/categories: Git
    tekton.dev/displayName: git
    tekton.dev/pipelines.minVersion: 0.38.0
    tekton.dev/platforms: 'linux/amd64,linux/s390x,linux/ppc64le,linux/arm64'
    tekton.dev/tags: git
  resourceVersion: '50218855'
  name: git
  uid: 1b88150a-f2c1-4030-9849-c7806c0745d8
  creationTimestamp: '2023-11-28T10:54:51Z'
  generation: 1
  labels:
    app.kubernetes.io/version: 0.1.0
spec:
  description: |
    This Task represents Git and is able to initialize and clone a remote repository on the informed Workspace. It's likely to become the first `step` on a Pipeline. 
  params:
    - description: |
        Git repository URL.
      name: URL
      type: string
    - default: main
      description: |
        Revision to checkout, an branch, tag, sha, ref, etc...
      name: REVISION
      type: string
    - default: ''
      description: |
        Repository `refspec` to fetch before checking out the revision.
      name: REFSPEC
      type: string
    - default: 'true'
      description: |
        Initialize and fetch Git submodules.
      name: SUBMODULES
      type: string
    - default: '1'
      description: |
        Number of commits to fetch, a "shallow clone" is a single commit.
      name: DEPTH
      type: string
    - default: 'true'
      description: |
        Sets the global `http.sslVerify` value, `false` is not advised unless
        you trust the remote repository.
      name: SSL_VERIFY
      type: string
    - default: ca-bundle.crt
      description: |
        Certificate Authority (CA) bundle filename on the `ssl-ca-directory`
        Workspace.
      name: CRT_FILENAME
      type: string
    - default: ''
      description: |
        Relative path to the `output` Workspace where the repository will be
        cloned.
      name: SUBDIRECTORY
      type: string
    - default: ''
      description: |
        List of directory patterns split by comma to perform "sparse checkout".
      name: SPARSE_CHECKOUT_DIRECTORIES
      type: string
    - default: 'true'
      description: |
        Clean out the contents of the `output` Workspace before cloning the
        repository, if data exists.
      name: DELETE_EXISTING
      type: string
    - default: ''
      description: |
        HTTP proxy server (non-TLS requests).
      name: HTTP_PROXY
      type: string
    - default: ''
      description: |
        HTTPS proxy server (TLS requests).
      name: HTTPS_PROXY
      type: string
    - default: ''
      description: |
        Opt out of proxying HTTP/HTTPS requests.
      name: NO_PROXY
      type: string
    - default: 'false'
      description: |
        Log the commands executed.
      name: VERBOSE
      type: string
    - default: /home/git
      description: |
        Absolute path to the Git user home directory.
      name: USER_HOME
      type: string
  results:
    - description: |
        The precise commit SHA digest cloned.
      name: COMMIT
      type: string
    - description: |
        The precise repository URL.
      name: URL
      type: string
    - description: |
        The epoch timestamp of the commit cloned.
      name: COMMITTER_DATE
      type: string
  stepTemplate:
    computeResources:
      limits:
        cpu: 100m
        memory: 256Mi
      requests:
        cpu: 100m
        memory: 256Mi
    env:
      - name: PARAMS_URL
        value: $(params.URL)
      - name: PARAMS_REVISION
        value: $(params.REVISION)
      - name: PARAMS_REFSPEC
        value: $(params.REFSPEC)
      - name: PARAMS_SUBMODULES
        value: $(params.SUBMODULES)
      - name: PARAMS_DEPTH
        value: $(params.DEPTH)
      - name: PARAMS_SSL_VERIFY
        value: $(params.SSL_VERIFY)
      - name: PARAMS_CRT_FILENAME
        value: $(params.CRT_FILENAME)
      - name: PARAMS_SUBDIRECTORY
        value: $(params.SUBDIRECTORY)
      - name: PARAMS_SPARSE_CHECKOUT_DIRECTORIES
        value: $(params.SPARSE_CHECKOUT_DIRECTORIES)
      - name: PARAMS_DELETE_EXISTING
        value: $(params.DELETE_EXISTING)
      - name: PARAMS_HTTP_PROXY
        value: $(params.HTTP_PROXY)
      - name: PARAMS_HTTPS_PROXY
        value: $(params.HTTPS_PROXY)
      - name: PARAMS_NO_PROXY
        value: $(params.NO_PROXY)
      - name: PARAMS_VERBOSE
        value: $(params.VERBOSE)
      - name: PARAMS_USER_HOME
        value: $(params.USER_HOME)
      - name: WORKSPACES_OUTPUT_PATH
        value: $(workspaces.output.path)
      - name: WORKSPACES_SSH_DIRECTORY_BOUND
        value: $(workspaces.ssh-directory.bound)
      - name: WORKSPACES_SSH_DIRECTORY_PATH
        value: $(workspaces.ssh-directory.path)
      - name: WORKSPACES_BASIC_AUTH_BOUND
        value: $(workspaces.basic-auth.bound)
      - name: WORKSPACES_BASIC_AUTH_PATH
        value: $(workspaces.basic-auth.path)
      - name: WORKSPACES_SSL_CA_DIRECTORY_BOUND
        value: $(workspaces.ssl-ca-directory.bound)
      - name: WORKSPACES_SSL_CA_DIRECTORY_PATH
        value: $(workspaces.ssl-ca-directory.path)
      - name: RESULTS_COMMITTER_DATE_PATH
        value: $(results.COMMITTER_DATE.path)
      - name: RESULTS_COMMIT_PATH
        value: $(results.COMMIT.path)
      - name: RESULTS_URL_PATH
        value: $(results.URL.path)
    securityContext:
      runAsNonRoot: true
      runAsUser: 65532
  steps:
    - computeResources: {}
      image: 'gcr.io/tekton-releases/github.com/tektoncd/pipeline/cmd/git-init:latest'
      name: load-scripts
      script: |
        printf '%s' "IyEvdXNyL2Jpbi9lbnYgc2gKCmV4cG9ydCBQQVJBTVNfVVJMPSIke1BBUkFNU19VUkw6LX0iCmV4cG9ydCBQQVJBTVNfUkVWSVNJT049IiR7UEFSQU1TX1JFVklTSU9OOi19IgpleHBvcnQgUEFSQU1TX1JFRlNQRUM9IiR7UEFSQU1TX1JFRlNQRUM6LX0iCmV4cG9ydCBQQVJBTVNfU1VCTU9EVUxFUz0iJHtQQVJBTVNfU1VCTU9EVUxFUzotfSIKZXhwb3J0IFBBUkFNU19ERVBUSD0iJHtQQVJBTVNfREVQVEg6LX0iCmV4cG9ydCBQQVJBTVNfU1NMX1ZFUklGWT0iJHtQQVJBTVNfU1NMX1ZFUklGWTotfSIKZXhwb3J0IFBBUkFNU19DUlRfRklMRU5BTUU9IiR7UEFSQU1TX0NSVF9GSUxFTkFNRTotfSIKZXhwb3J0IFBBUkFNU19TVUJESVJFQ1RPUlk9IiR7UEFSQU1TX1NVQkRJUkVDVE9SWTotfSIKZXhwb3J0IFBBUkFNU19TUEFSU0VfQ0hFQ0tPVVRfRElSRUNUT1JJRVM9IiR7UEFSQU1TX1NQQVJTRV9DSEVDS09VVF9ESVJFQ1RPUklFUzotfSIKZXhwb3J0IFBBUkFNU19ERUxFVEVfRVhJU1RJTkc9IiR7UEFSQU1TX0RFTEVURV9FWElTVElORzotfSIKZXhwb3J0IFBBUkFNU19IVFRQX1BST1hZPSIke1BBUkFNU19IVFRQX1BST1hZOi19IgpleHBvcnQgUEFSQU1TX0hUVFBTX1BST1hZPSIke1BBUkFNU19IVFRQU19QUk9YWTotfSIKZXhwb3J0IFBBUkFNU19OT19QUk9YWT0iJHtQQVJBTVNfTk9fUFJPWFk6LX0iCmV4cG9ydCBQQVJBTVNfVkVSQk9TRT0iJHtQQVJBTVNfVkVSQk9TRTotfSIKZXhwb3J0IFBBUkFNU19VU0VSX0hPTUU9IiR7UEFSQU1TX1VTRVJfSE9NRTotfSIKCmV4cG9ydCBXT1JLU1BBQ0VTX09VVFBVVF9QQVRIPSIke1dPUktTUEFDRVNfT1VUUFVUX1BBVEg6LX0iCmV4cG9ydCBXT1JLU1BBQ0VTX1NTSF9ESVJFQ1RPUllfQk9VTkQ9IiR7V09SS1NQQUNFU19TU0hfRElSRUNUT1JZX0JPVU5EOi19IgpleHBvcnQgV09SS1NQQUNFU19TU0hfRElSRUNUT1JZX1BBVEg9IiR7V09SS1NQQUNFU19TU0hfRElSRUNUT1JZX1BBVEg6LX0iCmV4cG9ydCBXT1JLU1BBQ0VTX0JBU0lDX0FVVEhfQk9VTkQ9IiR7V09SS1NQQUNFU19CQVNJQ19BVVRIX0JPVU5EOi19IgpleHBvcnQgV09SS1NQQUNFU19CQVNJQ19BVVRIX1BBVEg9IiR7V09SS1NQQUNFU19CQVNJQ19BVVRIX1BBVEg6LX0iCmV4cG9ydCBXT1JLU1BBQ0VTX1NTTF9DQV9ESVJFQ1RPUllfQk9VTkQ9IiR7V09SS1NQQUNFU19TU0xfQ0FfRElSRUNUT1JZX0JPVU5EOi19IgpleHBvcnQgV09SS1NQQUNFU19TU0xfQ0FfRElSRUNUT1JZX1BBVEg9IiR7V09SS1NQQUNFU19TU0xfQ0FfRElSRUNUT1JZX1BBVEg6LX0iCgpleHBvcnQgUkVTVUxUU19DT01NSVRURVJfREFURV9QQVRIPSIke1JFU1VMVFNfQ09NTUlUVEVSX0RBVEVfUEFUSDotfSIKZXhwb3J0IFJFU1VMVFNfQ09NTUlUX1BBVEg9IiR7UkVTVUxUU19DT01NSVRfUEFUSDotfSIKZXhwb3J0IFJFU1VMVFNfVVJMX1BBVEg9IiR7UkVTVUxUU19VUkxfUEFUSDotfSIKCiMgZnVsbCBwYXRoIHRvIHRoZSBjaGVja291dCBkaXJlY3RvcnksIHVzaW5nIHRoZSBvdXRwdXQgd29ya3NwYWNlIGFuZCBzdWJkaXJlY3RvciBwYXJhbWV0ZXIKZXhwb3J0IGNoZWNrb3V0X2Rpcj0iJHtXT1JLU1BBQ0VTX09VVFBVVF9QQVRIfS8ke1BBUkFNU19TVUJESVJFQ1RPUll9IgoKIwojIEZ1bmN0aW9ucwojCgpmYWlsKCkgewogICAgZWNobyAiRVJST1I6ICR7QH0iIDE+JjIKICAgIGV4aXQgMQp9CgpwaGFzZSgpIHsKICAgIGVjaG8gIi0tLT4gUGhhc2U6ICR7QH0uLi4iCn0KCiMgSW5zcGVjdCB0aGUgZW52aXJvbm1lbnQgdmFyaWFibGVzIHRvIGFzc2VydCB0aGUgbWluaW11bSBjb25maWd1cmF0aW9uIGlzIGluZm9ybWVkLgphc3NlcnRfcmVxdWlyZWRfY29uZmlndXJhdGlvbl9vcl9mYWlsKCkgewogICAgW1sgLXogIiR7UEFSQU1TX1VSTH0iIF1dICYmCiAgICAgICAgZmFpbCAiUGFyYW1ldGVyIFVSTCBpcyBub3Qgc2V0ISIKCiAgICBbWyAteiAiJHtXT1JLU1BBQ0VTX09VVFBVVF9QQVRIfSIgXV0gJiYKICAgICAgICBmYWlsICJPdXRwdXQgV29ya3NwYWNlIGlzIG5vdCBzZXQhIgoKICAgIFtbICEgLWQgIiR7V09SS1NQQUNFU19PVVRQVVRfUEFUSH0iIF1dICYmCiAgICAgICAgZmFpbCAiT3V0cHV0IFdvcmtzcGFjZSBkaXJlY3RvcnkgJyR7V09SS1NQQUNFU19PVVRQVVRfUEFUSH0nIG5vdCBmb3VuZCEiCgogICAgcmV0dXJuIDAKfQoKIyBDb3B5IHRoZSBmaWxlIGludG8gdGhlIGRlc3RpbmF0aW9uLCBjaGVja2luZyBpZiB0aGUgc291cmNlIGV4aXN0cy4KY29weV9vcl9mYWlsKCkgewogICAgbG9jYWwgX21vZGU9IiR7MX0iCiAgICBsb2NhbCBfc3JjPSIkezJ9IgogICAgbG9jYWwgX2RzdD0iJHszfSIKCiAgICBpZiBbWyAhIC1mICIke19zcmN9IiAmJiAhIC1kICIke19zcmN9IiBdXTsgdGhlbgogICAgICAgIGZhaWwgIlNvdXJjZSBmaWxlL2RpcmVjdG9yeSBpcyBub3QgZm91bmQgYXQgJyR7X3NyY30nIgogICAgZmkKCiAgICBpZiBbWyAtZCAiJHtfc3JjfSIgXV07IHRoZW4KICAgICAgICBjcCAtUnYgJHtfc3JjfSAke19kc3R9CiAgICAgICAgY2htb2QgLXYgJHtfbW9kZX0gJHtfZHN0fQogICAgZWxzZQogICAgICAgIGluc3RhbGwgLS12ZXJib3NlIC0tbW9kZT0ke19tb2RlfSAke19zcmN9ICR7X2RzdH0KICAgIGZpCn0KCiMgRGVsZXRlIGFueSBleGlzdGluZyBjb250ZW50cyBvZiB0aGUgcmVwbyBkaXJlY3RvcnkgaWYgaXQgZXhpc3RzLiBXZSBkb24ndCBqdXN0ICJybSAtcmYgPGRpcj4iCiMgYmVjYXVzZSBtaWdodCBiZSAiLyIgb3IgdGhlIHJvb3Qgb2YgYSBtb3VudGVkIHZvbHVtZS4KY2xlYW5fZGlyKCkgewogICAgbG9jYWwgX2Rpcj0iJHsxfSIKCiAgICBbWyAhIC1kICIke19kaXJ9IiBdXSAmJgogICAgICAgIHJldHVybiAwCgogICAgIyBEZWxldGUgbm9uLWhpZGRlbiBmaWxlcyBhbmQgZGlyZWN0b3JpZXMKICAgIHJtIC1yZnYgJHtfZGlyOj99LyoKICAgICMgRGVsZXRlIGZpbGVzIGFuZCBkaXJlY3RvcmllcyBzdGFydGluZyB3aXRoIC4gYnV0IGV4Y2x1ZGluZyAuLgogICAgcm0gLXJmdiAke19kaXJ9Ly5bIS5dKgogICAgIyBEZWxldGUgZmlsZXMgYW5kIGRpcmVjdG9yaWVzIHN0YXJ0aW5nIHdpdGggLi4gcGx1cyBhbnkgb3RoZXIgY2hhcmFjdGVyCiAgICBybSAtcmZ2ICR7X2Rpcn0vLi4/Kgp9CgojCiMgU2V0dGluZ3MKIwoKIyB3aGVuIHRoZSBrby1hcHAgZGlyZWN0b3J5IGlzIHByZXNlbnQsIG1ha2luZyBzdXJlIGl0J3MgcGFydCBvZiB0aGUgUEFUSApbWyAtZCAiL2tvLWFwcCIgXV0gJiYgZXhwb3J0IFBBVEg9IiR7UEFUSH06L2tvLWFwcCIKCiMgbWFraW5nIHRoZSBzaGVsbCB2ZXJib3NlIHdoZW4gdGhlIHBhcmFtdGVyIGlzIHNldApbWyAiJHtQQVJBTVNfVkVSQk9TRX0iID09ICJ0cnVlIiBdXSAmJiBzZXQgLXgKCnJldHVybiAw" |base64 -d >common.sh
        chmod +x "common.sh"
        printf '%s' "IyEvdXNyL2Jpbi9lbnYgc2gKIwojIEV4cG9ydHMgcHJveHkgYW5kIGN1c3RvbSBTU0wgQ0EgY2VydGlmaWNhdHMgaW4gdGhlIGVudmlyb21lbnQgYW5kIHJ1bnMgdGhlIGdpdC1pbml0IHdpdGggZmxhZ3MKIyBiYXNlZCBvbiB0aGUgdGFzayBwYXJhbWV0ZXJzLgojCgpzZXQgLWV1Cgpzb3VyY2UgJChDRFBBVEg9IGNkIC0tICIkKGRpcm5hbWUgLS0gJHswfSkiICYmIHB3ZCkvY29tbW9uLnNoCgphc3NlcnRfcmVxdWlyZWRfY29uZmlndXJhdGlvbl9vcl9mYWlsCgojCiMgQ0EgKGBzc2wtY2EtZGlyZWN0b3J5YCBXb3Jrc3BhY2UpCiMKCmlmIFtbICIke1dPUktTUEFDRVNfU1NMX0NBX0RJUkVDVE9SWV9CT1VORH0iID09ICJ0cnVlIiAmJiAtbiAiJHtQQVJBTVNfQ1JUX0ZJTEVOQU1FfSIgXV07IHRoZW4KCXBoYXNlICJJbnNwZWN0aW5nICdzc2wtY2EtZGlyZWN0b3J5JyB3b3Jrc3BhY2UgbG9va2luZyBmb3IgJyR7UEFSQU1TX0NSVF9GSUxFTkFNRX0nIGZpbGUiCgljcnQ9IiR7V09SS1NQQUNFU19TU0xfQ0FfRElSRUNUT1JZX1BBVEh9LyR7UEFSQU1TX0NSVF9GSUxFTkFNRX0iCglbWyAhIC1mICIke2NydH0iIF1dICYmCgkJZmFpbCAiQ1JUIGZpbGUgKFBBUkFNU19DUlRfRklMRU5BTUUpIG5vdCBmb3VuZCBhdCAnJHtjcnR9JyIKCglwaGFzZSAiRXhwb3J0aW5nIGN1c3RvbSBDQSBjZXJ0aWZpY2F0ZSAnR0lUX1NTTF9DQUlORk89JHtjcnR9JyIKCWV4cG9ydCBHSVRfU1NMX0NBSU5GTz0ke2NydH0KZmkKCiMKIyBQcm94eSBTZXR0aW5ncwojCgpwaGFzZSAiU2V0dGluZyB1cCBIVFRQX1BST1hZPScke1BBUkFNU19IVFRQX1BST1hZfSciCltbIC1uICIke1BBUkFNU19IVFRQX1BST1hZfSIgXV0gJiYgZXhwb3J0IEhUVFBfUFJPWFk9IiR7UEFSQU1TX0hUVFBfUFJPWFl9IgoKcGhhc2UgIlNldHR0aW5nIHVwIEhUVFBTX1BST1hZPScke1BBUkFNU19IVFRQU19QUk9YWX0nIgpbWyAtbiAiJHtQQVJBTVNfSFRUUFNfUFJPWFl9IiBdXSAmJiBleHBvcnQgSFRUUFNfUFJPWFk9IiR7UEFSQU1TX0hUVFBTX1BST1hZfSIKCnBoYXNlICJTZXR0aW5nIHVwIE5PX1BST1hZPScke1BBUkFNU19OT19QUk9YWX0nIgpbWyAtbiAiJHtQQVJBTVNfTk9fUFJPWFl9IiBdXSAmJiBleHBvcnQgTk9fUFJPWFk9IiR7UEFSQU1TX05PX1BST1hZfSIKCiMKIyBHaXQgQ2xvbmUKIwoKcGhhc2UgIlNldHRpbmcgb3V0cHV0IHdvcmtzcGFjZSBhcyBzYWZlIGRpcmVjdG9yeSAoJyR7V09SS1NQQUNFU19PVVRQVVRfUEFUSH0nKSIKZ2l0IGNvbmZpZyAtLWdsb2JhbCAtLWFkZCBzYWZlLmRpcmVjdG9yeSAiJHtXT1JLU1BBQ0VTX09VVFBVVF9QQVRIfSIKCnBoYXNlICJDbG9uaW5nICcke1BBUkFNU19VUkx9JyBpbnRvICcke2NoZWNrb3V0X2Rpcn0nIgpzZXQgLXgKZXhlYyBnaXQtaW5pdCBcCgktdXJsPSIke1BBUkFNU19VUkx9IiBcCgktcmV2aXNpb249IiR7UEFSQU1TX1JFVklTSU9OfSIgXAoJLXJlZnNwZWM9IiR7UEFSQU1TX1JFRlNQRUN9IiBcCgktcGF0aD0iJHtjaGVja291dF9kaXJ9IiBcCgktc3NsVmVyaWZ5PSIke1BBUkFNU19TU0xfVkVSSUZZfSIgXAoJLXN1Ym1vZHVsZXM9IiR7UEFSQU1TX1NVQk1PRFVMRVN9IiBcCgktZGVwdGg9IiR7UEFSQU1TX0RFUFRIfSIgXAoJLXNwYXJzZUNoZWNrb3V0RGlyZWN0b3JpZXM9IiR7UEFSQU1TX1NQQVJTRV9DSEVDS09VVF9ESVJFQ1RPUklFU30iCg==" |base64 -d >git-clone.sh
        chmod +x "git-clone.sh"
        printf '%s' "IyEvdXNyL2Jpbi9lbnYgc2gKIwojIFNldHMgdXAgdGhlIGJhc2ljIGFuZCBTU0ggYXV0aGVudGljYXRpb24gYmFzZWQgb24gaW5mb3JtZWQgd29ya3NwYWNlcywgYXMgd2VsbCBhcyBjbGVhbmluZyB1cCB0aGUKIyBwcmV2aW91cyBnaXQtY2xvbmUgc3RhbGUgZGF0YS4KIwoKc2V0IC1ldQoKc291cmNlICQoQ0RQQVRIPSBjZCAtLSAiJChkaXJuYW1lIC0tICR7MH0pIiAmJiBwd2QpL2NvbW1vbi5zaAoKYXNzZXJ0X3JlcXVpcmVkX2NvbmZpZ3VyYXRpb25fb3JfZmFpbAoKcGhhc2UgIlByZXBhcmluZyB0aGUgZmlsZXN5c3RlbSBiZWZvcmUgY2xvbmluZyB0aGUgcmVwb3NpdG9yeSIKCmlmIFtbICIke1dPUktTUEFDRVNfQkFTSUNfQVVUSF9CT1VORH0iID09ICJ0cnVlIiBdXTsgdGhlbgoJcGhhc2UgIkNvbmZpZ3VyaW5nIEdpdCBhdXRoZW50aWNhdGlvbiB3aXRoICdiYXNpYy1hdXRoJyBXb3Jrc3BhY2UgZmlsZXMiCgoJZm9yIGYgaW4gLmdpdC1jcmVkZW50aWFscyAuZ2l0Y29uZmlnOyBkbwoJCXNyYz0iJHtXT1JLU1BBQ0VTX0JBU0lDX0FVVEhfUEFUSH0vJHtmfSIKCQlwaGFzZSAiQ29weWluZyAnJHtzcmN9JyB0byAnJHtQQVJBTVNfVVNFUl9IT01FfSciCgkJY29weV9vcl9mYWlsIDQwMCAke3NyY30gIiR7UEFSQU1TX1VTRVJfSE9NRX0vIgoJZG9uZQpmaQoKaWYgW1sgIiR7V09SS1NQQUNFU19TU0hfRElSRUNUT1JZX0JPVU5EfSIgPT0gInRydWUiIF1dOyB0aGVuCglwaGFzZSAiQ29weWluZyAnLnNzaCcgZnJvbSBzc2gtZGlyZWN0b3J5IHdvcmtzcGFjZSAoJyR7V09SS1NQQUNFU19TU0hfRElSRUNUT1JZX1BBVEh9JykiCgoJZG90X3NzaD0iJHtQQVJBTVNfVVNFUl9IT01FfS8uc3NoIgoJY29weV9vcl9mYWlsIDcwMCAke1dPUktTUEFDRVNfU1NIX0RJUkVDVE9SWV9QQVRIfSAke2RvdF9zc2h9CgljaG1vZCAtUnYgNDAwICR7ZG90X3NzaH0vKgpmaQoKaWYgW1sgIiR7UEFSQU1TX0RFTEVURV9FWElTVElOR30iID09ICJ0cnVlIiBdXTsgdGhlbgoJcGhhc2UgIkRlbGV0aW5nIGFsbCBjb250ZW50cyBvZiBjaGVja291dC1kaXIgJyR7Y2hlY2tvdXRfZGlyfSciCgljbGVhbl9kaXIgJHtjaGVja291dF9kaXJ9IHx8IHRydWUKZmkKCmV4aXQgMA==" |base64 -d >prepare.sh
        chmod +x "prepare.sh"
        printf '%s' "IyEvdXNyL2Jpbi9lbnYgc2gKIwojIFNjYW4gdGhlIGNsb25lZCByZXBvc2l0b3J5IGluIG9yZGVyIHRvIHJlcG9ydCBkZXRhaWxzIHdyaXR0aW5nIHRoZSByZXN1bHQgZmlsZXMuCiMKCnNldCAtZXUKCnNvdXJjZSAkKENEUEFUSD0gY2QgLS0gIiQoZGlybmFtZSAtLSAkezB9KSIgJiYgcHdkKS9jb21tb24uc2gKCmFzc2VydF9yZXF1aXJlZF9jb25maWd1cmF0aW9uX29yX2ZhaWwKCnBoYXNlICJDb2xsZWN0aW5nIGNsb25lZCByZXBvc2l0b3J5IGluZm9ybWF0aW9uICgnJHtjaGVja291dF9kaXJ9JykiCgpjZCAiJHtjaGVja291dF9kaXJ9IiB8fCBmYWlsICJOb3QgYWJsZSB0byBlbnRlciBjaGVja291dC1kaXIgJyR7Y2hlY2tvdXRfZGlyfSciCgpwaGFzZSAiU2V0dGluZyBvdXRwdXQgd29ya3NwYWNlIGFzIHNhZmUgZGlyZWN0b3J5ICgnJHtXT1JLU1BBQ0VTX09VVFBVVF9QQVRIfScpIgpnaXQgY29uZmlnIC0tZ2xvYmFsIC0tYWRkIHNhZmUuZGlyZWN0b3J5ICIke1dPUktTUEFDRVNfT1VUUFVUX1BBVEh9IgoKcmVzdWx0X3NoYT0iJChnaXQgcmV2LXBhcnNlIEhFQUQpIgpyZXN1bHRfY29tbWl0dGVyX2RhdGU9IiQoZ2l0IGxvZyAtMSAtLXByZXR0eT0lY3QpIgoKcGhhc2UgIlJlcG9ydGluZyBsYXN0IGNvbW1pdCBkYXRlICcke3Jlc3VsdF9jb21taXR0ZXJfZGF0ZX0nIgpwcmludGYgIiVzIiAiJHtyZXN1bHRfY29tbWl0dGVyX2RhdGV9IiA+JHtSRVNVTFRTX0NPTU1JVFRFUl9EQVRFX1BBVEh9CgpwaGFzZSAiUmVwb3J0aW5nIHBhcnNlZCByZXZpc2lvbiBTSEEgJyR7cmVzdWx0X3NoYX0nIgpwcmludGYgIiVzIiAiJHtyZXN1bHRfc2hhfSIgPiR7UkVTVUxUU19DT01NSVRfUEFUSH0KCnBoYXNlICJSZXBvcnRpbmcgcmVwb3NpdG9yeSBVUkwgJyR7UEFSQU1TX1VSTH0nIgpwcmludGYgIiVzIiAiJHtQQVJBTVNfVVJMfSIgPiR7UkVTVUxUU19VUkxfUEFUSH0KCmV4aXQgMA==" |base64 -d >report.sh
        chmod +x "report.sh"
      volumeMounts:
        - mountPath: /scripts
          name: scripts-dir
      workingDir: /scripts
    - command:
        - /scripts/prepare.sh
      computeResources: {}
      image: 'gcr.io/tekton-releases/github.com/tektoncd/pipeline/cmd/git-init:latest'
      name: prepare
      volumeMounts:
        - mountPath: /scripts
          name: scripts-dir
        - mountPath: $(params.USER_HOME)
          name: user-home
    - command:
        - /scripts/git-clone.sh
      computeResources: {}
      image: 'gcr.io/tekton-releases/github.com/tektoncd/pipeline/cmd/git-init:latest'
      name: git-clone
      volumeMounts:
        - mountPath: /scripts
          name: scripts-dir
        - mountPath: $(params.USER_HOME)
          name: user-home
    - command:
        - /scripts/report.sh
      computeResources: {}
      image: 'gcr.io/tekton-releases/github.com/tektoncd/pipeline/cmd/git-init:latest'
      name: report
      volumeMounts:
        - mountPath: /scripts
          name: scripts-dir
  volumes:
    - emptyDir: {}
      name: user-home
    - emptyDir: {}
      name: scripts-dir
  workspaces:
    - description: |
        The Git repository directory, data will be placed on the root of the
        Workspace, or on the relative path defined by the SUBDIRECTORY
        parameter.
      name: output
    - description: |
        A `.ssh` directory with private key, `known_hosts`, `config`, etc.
        Copied to the Git user's home before cloning the repository, in order to
        server as authentication mechanismBinding a Secret to this Workspace is
        strongly recommended over other volume types.
      name: ssh-directory
      optional: true
    - description: |
        A Workspace containing a `.gitconfig` and `.git-credentials` files.
        These will be copied to the user's home before Git commands run. All
        other files in this Workspace are ignored. It is strongly recommended to
        use `ssh-directory` over `basic-auth` whenever possible, and to bind a
        Secret to this Workspace over other volume types.
      name: basic-auth
      optional: true
    - description: |
        A Workspace containing CA certificates, this will be used by Git to
        verify the peer with when interacting with remote repositories using
        HTTPS.
      name: ssl-ca-directory
      optional: true

https://github.com/openshift/console/pull/13379

Bug OCPBUGS-25697: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-file-csi-driver/pull/50

Story TRT-1347: Disable Image Registry Disruption Monitoring When Not HA

View the Description View the linked PRs

~~OCPBUGS-18596~~ and ~~OCPBUGS-22382~~ track issues on metal and vsphere jobs with disruption for image registry. By default image registry is not enabled for these platforms but is enabled, in a non HA manor, for the tests. During discussion around the issue it was decided that unless / until these teams support HA deployments of image registry we should not be monitoring them for disruption.

Devan floated the idea of checking to see if the image registry deployment set has replicas enabled and if not then selectively disable disruption monitoring.

https://github.com/openshift/origin/pull/28425

Bug OCPBUGS-22976: S2I Build Wizard should check for Containerfile in addition to Dockerfile

View the Description View the linked PRs

Description of problem:

A Github project with a Containerfile instead of a Dockerfile is not seen as a Buildah target, and the wizard falls through to templating as a standard (language) project.

Version-Release number of selected component (if applicable):


Server Version: 4.13.18
Kubernetes Version: v1.26.9+c7606e7

How reproducible:

Always

Steps to Reproduce:

1. Create a git application with Containerfile, e.g. https://github.com/cwilkers/jumble-c
2. Use the Developer view to add the app as a git repo
3. Observe failure as project is not built properly due to ignoring Containerfile

Actual results:

Build failure

Expected results:

Buildah includes Containerfile which includes html and other resources required for app

Additional info:

https://github.com/cwilkers/jumble-c

https://github.com/openshift/console/pull/13378

Bug OCPBUGS-27017: [release-4.15] replace instanceAdmin role with specific compute permissions

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-storage-operator/pull/439

Bug OCPBUGS-18250: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/prometheus-alertmanager/pull/74

Bug OCPBUGS-23668: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-openstack/pull/97

Bug OCPBUGS-24816: Update 4.16 ose-machine-os-images-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-os-images/pull/34

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-os-images/pull/34

Bug OCPBUGS-18103: Panic detected in pod on 4.14 PowerVS CI runs

View the Description View the linked PRs

Description:

Now that the huge e2e test case failures in CI jobs is resolved in the recent jobs observed a Undiagnosed panic detected in pod issue.

JobLink

Error:

{ pods/openshift-image-registry_cluster-image-registry-operator-7f7bd7c9b4-k8fmh_cluster-image-registry-operator_previous.log.gz:E0825 02:44:06.686400 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) pods/openshift-image-registry_cluster-image-registry-operator-7f7bd7c9b4-k8fmh_cluster-image-registry-operator_previous.log.gz:E0825 02:44:06.686630 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)}

Some Observations:
1)While starting ImageConfigController it Failed to watch *v1.Route: as the server could not find the requested resource",

2)which eventually lead sync problem "E0825 01:26:52.428694 1 clusteroperator.go:104] unable to sync ClusterOperatorStatusController: config.imageregistry.operator.openshift.io "cluster" not found, requeuing"

3)and then while creating deployment resource for "cluster-image-registry-operator" it caused a panic error: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference):"

https://github.com/openshift/cluster-image-registry-operator/pull/909

Bug OCPBUGS-23649: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openshift-state-metrics/pull/109

Bug OCPBUGS-24095: Update 4.15 openshift-enterprise-registry-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/image-registry/pull/387

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/image-registry/pull/387

Bug OCPBUGS-18902: Internal Registry Secrets merge causing excessive API calls

View the Description View the linked PRs

In this recent PR that merged, a number of API calls do not use caches causing excessive calls.

Done when:

-Change all Get() calls to use listers

-API call metric should decrease

https://github.com/openshift/machine-config-operator/pull/3912

Bug OCPBUGS-22225: Remove wildfly docker.io samples

View the Description View the linked PRs

Samples operator in OKD refers to docker.io/openshift/wildfly, which are no longer available. Library sync should update samples to use quay.io links

https://github.com/openshift/cluster-samples-operator/pull/519

Bug OCPBUGS-20267: [GCP] Unit tests have deadlock condition in termination handler

View the Description View the linked PRs

Description of problem:

Due to the way that the termination handlers unit tests are configured, it is possible in some cases for the counter of http requests to the mock handler can cause the test to deadlock and time out. This happens randomly as the ordering of the tests has an effect on when the bug occurs.

Version-Release number of selected component (if applicable):

4.13+

How reproducible:

It happens randomly when run in CI, or when the full suite is run. But if the tests are focused it will happen every time.
Focusing on "poll URL cannot be reached" will exploit the unit test.

Steps to Reproduce:

1. add `-focus "poll URL cannot be reached"` to unit test ginkgo arguments
2. run `make unit`

Actual results:

test suite hangs after this output:
"Handler Suite when running the handler when polling the termination endpoint and the poll URL cannot be reached should return an error /home/mike/dev/machine-api-provider-aws/pkg/termination/handler_test.go:197"

Expected results:

Tests pass

Additional info:

to fix this we need to isolate the test in its own context block, this patch should do the trick:

diff --git a/pkg/termination/handler_test.go b/pkg/termination/handler_test.go
index 2b98b08b..0f85feae 100644
--- a/pkg/termination/handler_test.go
+++ b/pkg/termination/handler_test.go
@@ -187,7 +187,9 @@ var _ = Describe("Handler Suite", func() {
                                        Consistently(nodeMarkedForDeletion(testNode.Name)).Should(BeFalse())
                                })
                        })
+               })
 
+               Context("when the termination endpoint is not valid", func() {
                        Context("and the poll URL cannot be reached", func() {
                                BeforeEach(func() {
                                        nonReachable := "abc#1://localhost"

https://github.com/openshift/machine-api-provider-gcp/pull/61

Bug OCPBUGS-23339: The name for ImageDigestMirrorSet created by oc-mirror is not valid

View the Description View the linked PRs

Description of problem:

The name for ImageDigestMirrorSet created by oc-mirror is not valid

Version-Release number of selected component (if applicable):

4.15

How reproducible:

always

Steps to Reproduce:

1. Use the idcp yaml file created by oc-mirror , will hit error

Actual results:
cat out/working-dir/cluster-resources/idms_2023-11-16T04\:04\:49Z.yaml
apiVersion: config.openshift.io/v1
kind: ImageDigestMirrorSet
metadata:
creationTimestamp: null
name: idms_2023-11-16T04:04:49Z
spec:
imageDigestMirrors:
- mirrors:
- ec2-3-143-247-94.us-east-2.compute.amazonaws.com:5000/ocp/openshift-release-dev
source: quay.io/openshift-release-dev
- mirrors:
- ec2-3-143-247-94.us-east-2.compute.amazonaws.com:5000/ocp/openshift
source: localhost:5005/openshift
status: {}

oc create -f out/working-dir/cluster-resources/idms_2023-11-16T04\:04\:49Z.yaml
The ImageDigestMirrorSet "idms_2023-11-16T04:04:49Z" is invalid: metadata.name: Invalid value: "idms_2023-11-16T04:04:49Z": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9][a-z0-9])?(\.[a-z0-9]([-a-z0-9][a-z0-9])?)*')

Expected results:

name valid and no error.

Additional info:

https://github.com/openshift/oc-mirror/pull/743

Bug OCPBUGS-19133: Update 4.15 ose-machine-os-images image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-os-images/pull/30

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-os-images/pull/30

Bug OCPBUGS-19494: when ovn ipsec pod stop/restart it kills pluto preventing further IPsec IKE communication

View the Description View the linked PRs

Description of problem:

ipsec container kills pluto even if that was started by systemd

Version-Release number of selected component (if applicable):

on any 4.14 nightly

How reproducible:

every time

Steps to Reproduce:

1. enable N-S ipsec
2. enable E-W IPsec
3. kill/stop/delete one of the ipsec-host pods

Actual results:

pluto is killed on that host

Expected results:

pluto keeps running

Additional info:

https://github.com/yuvalk/cluster-network-operator/blob/37d1cc72f4f6cd999046bd487a705e6da31301a5/bindata/network/ovn-kubernetes/common/ipsec-host.yaml#L235
this should be removed

https://github.com/openshift/cluster-network-operator/pull/2015

Bug OCPBUGS-25949: CVO should continue to periodically fetch upstream Cincinnati despite Recommended=Unknown risks

View the Description View the linked PRs

This is a clone of issue OCPBUGS-25708. The following is the description of the original issue:
—

Description of problem:

Changes made for faster risk cache-warming (the ~~OCPBUGS-19512~~ series) introduced an unfortunate cycle:

1. Cincinnati serves vulnerable PromQL, like graph-data#4524.
2. Clusters pick up that broken PromQL, try to evaluate, and fail. Re-eval-and-fail loop continues.
3. Cincinnati PromQL fixed, like graph-data#4528.
4. Cases:

- (a) Before the cache-warming changes, and also after this bug's fix, Clusters pick up the fixed PromQL, try to evaluate, and start succeeding. Hooray!
- (b) Clusters with the cache-warming changes but without this bug's fix say "it's been a long time since we pulled fresh Cincinanti information, but it has not been long since my last attempt to eval this broken PromQL, so let me skip the Cincinnati pull and re-eval that old PromQL", which fails. Re-eval-and-fail loop continues.

Version-Release number of selected component (if applicable):

The regression went back via:

Updates from those releases (and later in their 4.y, until this bug lands a fix) to later releases are exposed.

How reproducible:

Likely very reproducible for exposed releases, but only when clusters are served PromQL risks that will consistently fail evaluation.

Steps to Reproduce:

1. Launch a cluster.
2. Point it at dummy Cincinnati data, as described in ~~OTA-520~~. Initially declare a risk with broken PromQL in that data, like cluster_operator_conditions.
3. Wait until the cluster is reporting Recommended=Unknown for those risks (oc adm upgrade --include-not-recommended).
4. Update the risk to working PromQL, like group(cluster_operator_conditions). Alternatively, update anything about the update-service data (e.g. adding a new update target with a path from the cluster's version).
5. Wait 10 minutes for the CVO to have plenty of time to pull that new Cincinnati data.
6. oc get -o json clusterversion version | jq '.status.conditionalUpdates[].risks[].matchingRules[].promql.promql' | sort | uniq | jq -r .

Actual results:

Exposed releases will still have the broken PromQL in their output (or will lack the new update target you added, or whatever the Cincinnati data change was).

Expected results:

Fixed releases will have picked up the fixed PromQL in their output (or will have the new update target you added, or whatever the Cincinnati data change was).

Additional info:

Identification

To detect exposure in collected Insights, look for EvaluationFailed conditionalUpdates like:

$ oc get -o json clusterversion version | jq -r '.status.conditionalUpdates[].conditions[] | select(.type == "Recommended" and .status == "Unknown" and .reason == "EvaluationFailed" and (.message | contains("invalid PromQL")))'
{
  "lastTransitionTime": "2023-12-15T22:00:45Z",
  "message": "Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34\nAdding a new worker node will fail for clusters running on ARO. https://issues.redhat.com/browse/MCO-958",
  "reason": "EvaluationFailed",
  "status": "Unknown",
  "type": "Recommended"
}

To confirm in-cluster vs. other EvaluationFailed invalid PromQL issues, you can look for Cincinnati retrieval attempts in CVO logs. Example from a healthy cluster:

$ oc -n openshift-cluster-version logs -l k8s-app=cluster-version-operator --tail -1 --since 30m | grep 'request updates from\|PromQL' | tail
I1221 20:36:39.783530       1 cincinnati.go:114] Using a root CA pool with 0 root CA subjects to request updates from https://api.openshift.com/api/upgrades_info/v1/graph?...
I1221 20:36:39.831358       1 promql.go:118] evaluate PromQL cluster condition: "(\n  group(cluster_operator_conditions{name=\"aro\"})\n  or\n  0 * group(cluster_operator_conditions)\n)\n"
I1221 20:40:19.674925       1 cincinnati.go:114] Using a root CA pool with 0 root CA subjects to request updates from https://api.openshift.com/api/upgrades_info/v1/graph?...
I1221 20:40:19.727998       1 promql.go:118] evaluate PromQL cluster condition: "(\n  group(cluster_operator_conditions{name=\"aro\"})\n  or\n  0 * group(cluster_operator_conditions)\n)\n"
I1221 20:43:59.567369       1 cincinnati.go:114] Using a root CA pool with 0 root CA subjects to request updates from https://api.openshift.com/api/upgrades_info/v1/graph?...
I1221 20:43:59.620315       1 promql.go:118] evaluate PromQL cluster condition: "(\n  group(cluster_operator_conditions{name=\"aro\"})\n  or\n  0 * group(cluster_operator_conditions)\n)\n"
I1221 20:47:39.457582       1 cincinnati.go:114] Using a root CA pool with 0 root CA subjects to request updates from https://api.openshift.com/api/upgrades_info/v1/graph?...
I1221 20:47:39.509505       1 promql.go:118] evaluate PromQL cluster condition: "(\n  group(cluster_operator_conditions{name=\"aro\"})\n  or\n  0 * group(cluster_operator_conditions)\n)\n"
I1221 20:51:19.348286       1 cincinnati.go:114] Using a root CA pool with 0 root CA subjects to request updates from https://api.openshift.com/api/upgrades_info/v1/graph?...
I1221 20:51:19.401496       1 promql.go:118] evaluate PromQL cluster condition: "(\n  group(cluster_operator_conditions{name=\"aro\"})\n  or\n  0 * group(cluster_operator_conditions)\n)\n"

showing fetch lines every few minutes. And from an exposed cluster, only showing PromQL eval lines:

$ oc -n openshift-cluster-version logs -l k8s-app=cluster-version-operator --tail -1 --since 30m | grep 'request updates from\|PromQL' | tail
I1221 20:50:10.165101       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:11.166170       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:12.166314       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:13.166517       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:14.166847       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:15.167737       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:16.168486       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:17.169417       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:18.169576       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:19.170544       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
$ oc -n openshift-cluster-version logs -l k8s-app=cluster-version-operator --tail -1 --since 30m | grep 'request updates from' | tail
...no hits...

Recovery

If bitten, the remediation is to address the invalid PromQ. For example, we fixed that AROBrokenDNSMasq expression in graph-data#4528. And after that the local cluster administrator should restart their CVO, such as with:

$ oc -n openshift-cluster-version delete -l k8s-app=cluster-version-operator pods

https://github.com/openshift/cluster-version-operator/pull/1013

Task OU-179: Fix the root cause of externalLabels not present on alerts

View the Description View the linked PRs

Background

In order to evaluate solutions for https://issues.redhat.com/browse/RFE-3953 we need to investigate the root cause of the issue

Outcomes

If there is an issue, have a strategy to display external labels on alerts

https://github.com/openshift/monitoring-plugin/pull/53

Bug OCPBUGS-11344: Alertmanager service accounts auto mount token

View the Description View the linked PRs

Description of problem:

The ServiceAccounts for both in-cluster and UWM alertmanager set autoMountServiceAccountToken: true.
This should be improved and set at the pod level. Hence this will require a change in prometheus-operator and its configuration of Alertmanager pods.

A similar change for Prometheus pods was implemented in https://github.com/prometheus-operator/prometheus-operator/pull/4514.

https://github.com/openshift/cluster-monitoring-operator/blob/7702f6c7d6e1409dea9197e63dafcb0decbe60b9/assets/alertmanager-user-workload/service-account.yaml#L2

https://github.com/openshift/cluster-monitoring-operator/blob/7702f6c7d6e1409dea9197e63dafcb0decbe60b9/assets/alertmanager/service-account.yaml#L2

https://github.com/openshift/cluster-monitoring-operator/pull/2111

Bug OCPBUGS-19200: Update 4.15 ose-openstack-cinder-csi-driver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-openstack/pull/216

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-openstack/pull/216

Bug OCPBUGS-22199: Console login flow forgot query parameters / Deeplinking doesn't work

View the Description View the linked PRs

Description of problem:
The RHDP-Developer/DXP team wants to deep-link some catalog pages with a filter on the Developer Sandbox cluster. The target page was shown without any query parameter when the user wasn't logged in.

Version-Release number of selected component (if applicable):
At least 4.13 (Dev Sandbox clusters run 4.13.13 currently.)

How reproducible:
Always when not logged in

Steps to Reproduce:

Login
Switch to Developer perspective
Navigate to Add > Developer Catalog > Builder Images > Add filter for ".NET" (for example)
1. Users are applied to different clusters, so the exact URL isn't known, but the Path and Query parameters should look like this:
```
/catalog/ns/cjerolim-dev?catalogType=BuilderImage&keyword=.NET
```
2. Save the full URL incl. these query parameters
Logout
Enter the full URL from above
Login

Actual results:
The Developer Catalog is opened, but the catalog type "Build Images" and keyword filter ".NET" are not applied.

All Developer Catalog items are shown.

Expected results:
The Developer Catalog should open with the catalog type "Build Images" and the keyword filter ".NET" applied.

Exactly one catalog item should be shown.

Additional info:

https://github.com/openshift/console/pull/13268

Bug OCPBUGS-23462: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/771

Bug OCPBUGS-25706: Archieved in Tekton Results icon is not shown in list and details page for PipelineRuns imported from Tekton Results db

View the Description View the linked PRs

This is a clone of issue OCPBUGS-25396. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13461

Bug OCPBUGS-19169: Update 4.15 ose-cluster-autoscaler-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-autoscaler-operator/pull/286

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-autoscaler-operator/pull/286

Bug OCPBUGS-20178: Use a private IPv4 address range for the transit switch subnet in OVN IC

View the Description View the linked PRs

The IP range 168.254.0.0/16 that we chose as default for the transit switch is a public one. Let's use a private one instead, making sure it won't collide with address blocks already in use.

In the future we might want to make this configurable, but for now let's just make sure we pick an IP range that is not used elsewhere in openshift.

https://github.com/openshift/ovn-kubernetes/pull/1931

Bug OCPBUGS-21760: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kube-state-metrics/pull/100

Bug OCPBUGS-23878: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openstack-cinder-csi-driver-operator/pull/141

Bug OCPBUGS-23015: KAS HSTS is not configured on Hypershift control planes

View the Description View the linked PRs

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1392

configured HSTS for the KAS in standalone and we need to follow

https://github.com/openshift/hypershift/pull/3088

Bug OCPBUGS-13152: Unnecessary API calls if TektonConfig is not minimal

View the Description View the linked PRs

Description of problem:
With ~~OCPBUGS-11099~~ our Pipeline Plugin supports the TektonConfig config "embedded-status: minimal" option that will be the default in OpenShift Pipelines 1.11+.

But since this change, the Pipeline pages loads the TaskRuns for any Pipeline and PipelineRun rows. To decrease the risk of a performance issue we should make this call only if the status.tasks wasn't defined.

Version-Release number of selected component (if applicable):

4.12-4.14, as soon as ~~OCPBUGS-11099~~ is backported.
Tested with Pipelines operator 1.10.1

How reproducible:
Always

Steps to Reproduce:

Install Pipelines operator
Import a Git repository and enable the Pipeline option
Open the browser network inspector
Navigate to the Pipeline page

Actual results:
The list page load a list of TaskRuns for each Pipeline / PipelineRun also if the PipelineRun contains the related data already (status.tasks)

Expected results:
No unnecessary network calls. When the admin changes the TektonConfig config "embedded-status" option to minimal the UI should still work and load the TaskRuns as it does it today.

Additional info:
None

https://github.com/openshift/console/pull/13065

Bug OCPBUGS-17669: Validate Cluster Name in HostedCluster Controller

View the Description View the linked PRs

Description of problem:

The HostedCluster name is not currently validated against RFC1123.

Version-Release number of selected component (if applicable):

How reproducible:

Every time

Steps to Reproduce:

1.
2.
3.

Actual results:

Any HostedCluster name is allowed

Expected results:

Only HostedCluster names meeting RFC1123 validation should be allowed.

Additional info:

https://github.com/openshift/hypershift/pull/3036

Bug OCPBUGS-19552: OKD: Agent-based Installer is broken for HA-deployments of OKD/FCOS when api-int.* endpoint is not defined

View the Description View the linked PRs

Description of problem:

Agent-based Installer fails to deploy a HA cluster (3x masters, 2x workers) with OKD/FCOS when the network DNS server does not resolve the api-int.* endpoint. The latter is not required for HA deployments and is actually never mentioned in OCP docs for Agent-based Installer. OCP is not affected at all.

Version-Release number of selected component (if applicable):

4.13
4.14
4.15

https://github.com/openshift/installer/pull/7516

Bug OCPBUGS-20533: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2074

Bug OCPBUGS-18999: Intermittent 504 Gateway Time-out

View the Description View the linked PRs

Description of problem:

Image pulls fail with http status 504, gateway timeout until image registry pods are restarted.

Version-Release number of selected component (if applicable):

4.13.12

How reproducible:

Intermittent

Steps to Reproduce:

1.
2.
3.

Actual results:

Images can't be pulled: 
podman pull registry.ci.openshift.org/ci/applyconfig:latest Trying to pull registry.ci.openshift.org/ci/applyconfig:latest... Getting image source signatures Error: reading signatures: downloading signatures for sha256:83c1b636069c3302f5ba5075ceeca5c4a271767900fee06b919efc3c8fa14984 in registry.ci.openshift.org/ci/applyconfig: received unexpected HTTP status: 504 Gateway Time-out


Image registry pods contain errors:
time="2023-09-01T02:25:39.596485238Z" level=warning msg="error authorizing context: access denied" go.version="go1.19.10 X:strictfipsruntime" http.request.host=registry.ci.openshift.org http.request.id=3e805818-515d-443f-8d9b-04667986611d http.request.method=GET http.request.remoteaddr=18.218.67.82 http.request.uri="/v2/ocp/4-dev-preview/manifests/sha256:caf073ce29232978c331d421c06ca5c2736ce5461962775fdd760b05fb2496a0" http.request.useragent="containers/5.24.1 (github.com/containers/image)" vars.name=ocp/4-dev-preview vars.reference="sha256:caf073ce29232978c331d421c06ca5c2736ce5461962775fdd760b05fb2496a0"

Expected results:

Image registry does not return gateway timeouts

Additional info:

Must gather(s) attached, additional information in linked OHSS ticket.

https://github.com/openshift/image-registry/pull/380

Bug OCPBUGS-23796: not possible to drain a master node after multiple master nodes experience network disruption

View the Description View the linked PRs

Description of problem:

- upgrade the cluster
- 2 or more kube-apiserver pod do not become online. Network access could be lost due to misconfiguration or wrong rhel update. We can simulate this with:
    ssh into a node
    run iptables -A INPUT -p tcp --destination-port 6443 -j DROP
- 2 or more kube-apiserver-guard pods lose readiness
- kube-apiserver-guard-pdb PDB blocks the node drain because status.currentHealthy is less than status.desiredHealthy
- it is not possible to drain the node without overriding eviction requests (forcefully deleting the guard pods)`

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

in a description

Actual results:

evicting pod openshift-kube-apiserver/kube-apiserver-guard-ip-10-0-19-181.eu-north-1.compute.internal
    error when evicting pods/"kube-apiserver-guard-ip-10-0-19-181.eu-north-1.compute.internal" -n "openshift-kube-apiserver" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.

Expected results:

it is possible to evict the unready pods

Additional info:

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1579

Story AGENT-337: Support both IPv4 and IPv6 VIPs simultaneously

View the Description View the linked PRs

A change to the installConfig in 4.12 means a user can now specify both an IPv4 and IPv6 address for the API and/or Ingress VIPs when running dual-stack on the baremetal or vsphere platforms. (Previously, only an IPv4 VIP could be used on dual-stack clusters.)

Once the assisted-service and ZTP support this, we'll want to allow passing that information through.

Bug OCPBUGS-19052: Avoid caching etcdctl on cluster-backup.sh

View the Description View the linked PRs

Description of problem:

With OCPBUGS-18274 we had to update the etcdctl binary. Unfortunately the script does not attempt to update the binary if it's found in the path already:

https://github.com/openshift/cluster-etcd-operator/blob/master/bindata/etcd/etcd-common-tools#L16-L24

This causes confusion as the binary might not be the latest that we're shipping with etcd.

Pulling the binary shouldn't be a big deal, etcd is running locally anyway and the local image should be cached already just fine. We should always replace the binary

Version-Release number of selected component (if applicable):

any currently supported release

How reproducible:

always

Steps to Reproduce:

1. run cluster-backup.sh to download the binary
2. update the etcd image (take a different version or so)
3. run cluster-backup.sh again

Actual results:

cluster-backup.sh will simply print "etcdctl is already installed"

Expected results:

etcdctl should always be pulled

Additional info:

Bug OCPBUGS-20305: Extra space is in the translation text(Chinese) of 'Create rolebinding' and 'replicate rolebinding'

View the Description View the linked PRs

Description of problem:

Extra space is in the translation text(Chinese) of Duplicate RoleBinding' in kebab list

The change of PR https://github.com/openshift/console/pull/12099 for some reason are not included into the master/release4.12-4.14 branch

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-08-220853

How reproducible:

Always

Steps to Reproduce:

1. Login OCP, update language to Chinese
2. Navigate to RoleBindings page, choose one rolebinding, click the kebab icon on the end, check the translation text of 'Duplicate RoleBinding'
3.

Actual results:

2. It's shown '重复 角色绑定' and "重复 集群角色绑定"

Expected results:

Remove extra space
It's shown '重复角色绑定' and "重复集群角色绑定"

Additional info:

https://github.com/openshift/console/pull/13236

Bug OCPBUGS-19136: Update 4.15 ose-cluster-openshift-controller-manager-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/304

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-23685: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-ibmcloud/pull/72

Bug OCPBUGS-23745: monitoring ClusterOperator should not blip Available=False on quick etcd leader changes

View the Description View the linked PRs

Description of problem:

Seen in 4.15 update CI:

: [bz-Monitoring] clusteroperator/monitoring should not change condition/Available expand_less
Run #0: Failed expand_less 1h16m1s
{ 1 unexpected clusteroperator state transitions during e2e test run

Nov 21 04:20:56.837 - 19s E clusteroperator/monitoring condition/Available reason/UpdatingPrometheusK8SFailed status/False reconciling Prometheus Federate Route failed: retrieving Route object failed: etcdserver: leader changed}

While the Kube API server is supposed to buffer clients from etcd leader transitions, an issue that only persists for 19s is not long enough to warrant immediate admin intervention. Teaching the monitoring operator to stay Available=True for this kind of brief hiccup, while still going Available=False for issues where least part of the component is non-functional, and that the condition requires immediate administrator intervention would make it easier for admins and SREs operating clusters to identify when intervention was required.

Version-Release number of selected component (if applicable):

A bunch of 4.15 jobs are impacted, almost all update jobs:

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=48h&type=junit&search=clusteroperator/monitoring+should+not+change+condition/Available&#39; | grep '^periodic-.*4[.]15.*failures match' | sort
periodic-ci-openshift-cluster-etcd-operator-release-4.15-periodics-e2e-aws-etcd-recovery (all) - 2 runs, 100% failed, 50% of failures match = 50% impact
periodic-ci-openshift-hypershift-release-4.15-periodics-e2e-kubevirt-conformance (all) - 2 runs, 50% failed, 200% of failures match = 100% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-nightly-4.14-ocp-ovn-remote-libvirt-s390x (all) - 6 runs, 100% failed, 33% of failures match = 33% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-aws-ovn-heterogeneous-upgrade (all) - 5 runs, 40% failed, 50% of failures match = 20% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-upgrade-azure-ovn-heterogeneous (all) - 5 runs, 20% failed, 100% of failures match = 20% impact
periodic-ci-openshift-release-master-ci-4.15-e2e-aws-upgrade-ovn-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade (all) - 50 runs, 56% failed, 4% of failures match = 2% impact
periodic-ci-openshift-release-master-ci-4.15-e2e-azure-sdn-upgrade (all) - 5 runs, 20% failed, 100% of failures match = 20% impact
periodic-ci-openshift-release-master-ci-4.15-e2e-gcp-ovn-upgrade (all) - 80 runs, 44% failed, 9% of failures match = 4% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-aws-ovn-upgrade (all) - 80 runs, 30% failed, 17% of failures match = 5% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-azure-ovn-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-azure-sdn-upgrade (all) - 80 runs, 43% failed, 38% of failures match = 16% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-gcp-ovn-rt-upgrade (all) - 52 runs, 15% failed, 175% of failures match = 27% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-gcp-ovn-upgrade (all) - 5 runs, 20% failed, 100% of failures match = 20% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-gcp-sdn-upgrade (all) - 5 runs, 60% failed, 33% of failures match = 20% impact
periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-ovn-single-node-serial (all) - 5 runs, 100% failed, 40% of failures match = 40% impact
periodic-ci-openshift-release-master-nightly-4.15-e2e-ibmcloud-csi (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-nightly-4.15-upgrade-from-stable-4.14-e2e-aws-sdn-upgrade (all) - 5 runs, 20% failed, 100% of failures match = 20% impact
periodic-ci-openshift-release-master-nightly-4.15-upgrade-from-stable-4.14-e2e-aws-upgrade-ovn-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-nightly-4.15-upgrade-from-stable-4.14-e2e-metal-ipi-sdn-bm-upgrade (all) - 5 runs, 100% failed, 20% of failures match = 20% impact

Hit rates are low enough there that I haven't checked older 4.y. I'm not sure if all of those hits are UpdatingPrometheusK8SFailed or not, it seems likely that Kube API hiccups could impact a number of control loops. And there may be other triggers going on besides Kube API hiccups.

How reproducible:

16% impact in periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-azure-sdn-upgrade looks like the current largest impact percentage among the jobs with double-digit run counts.

Steps to Reproduce:

Run periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-azure-sdn-upgrade or another job with a combination of high-ish impact percentage and high run counts, watching the monitoring ClusterOperator's Available condition.

Actual results:

Blips of Available=False that resolve more quickly than a responding admin could be expected to show up.

Expected results:

Only going Available=False when it seems reasonable to summon an emergency admin response.

Additional info:

I have no problem if folks decide to push for Kube API server / etcd perfection, but that seems like a hard goal to reach reliably in the mess of the real world, so even if you do push those folks for improvements, I think it makes sense to relax your response to those kinds of issues to only complain when things like Route object retrieval failures go on for long enough for the operator to be seriously

https://github.com/openshift/cluster-monitoring-operator/pull/2179

Task OSASINFRA-3294: UPI docs script to update bootstrap ignition shim is broken

View the Description View the linked PRs

The official openshift doc does not contain this issue https://docs.openshift.com/container-platform/4.14/installing/installing_openstack/installing-openstack-user.html
Only the upstream docs has it.

https://github.com/openshift/installer/pull/7743

Bug OCPBUGS-23461: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-etcd-operator/pull/1161

Bug OCPBUGS-24099: Update 4.15 ose-cluster-csi-snapshot-controller-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/176

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/176

Bug OCPBUGS-18357: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-etcd-operator/pull/1105

Bug OCPBUGS-22778: All resources' yaml tab show TypeError after MCE operator is installed

View the Description View the linked PRs

This bug fix is in conjunction with https://issues.redhat.com/browse/OCPBUGS-16736

https://github.com/openshift/console/pull/13346

Bug OU-261: monitoring-plugin: The "Overwriting current silence" message should have padding

View the Description View the linked PRs

The "Overwriting current silence" information alert should have padding to be consistent with other alert messages.

https://github.com/openshift/monitoring-plugin/pull/74

Bug OCPBUGS-18339: Machine API Operator vSphere controller references retired KCS for HW Version Migrations

View the Description View the linked PRs

Description of problem:

The vSphere code references a Red Hat solution that has been retired in favour of the code being merged into the official documentation.

https://github.com/openshift/machine-api-operator/blob/master/pkg/controller/vsphere/reconciler.go#L827

Version-Release number of selected component (if applicable):

4.11-4.13 + main

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

UI presents a message with solution customers can not access.

Hardware lower than 15 is not supported, clone stopped. Detected machine template version is 13. Please update machine template: https://access.redhat.com/articles/6090681

Expected results:

Should referenced official documentation: https://docs.openshift.com/container-platform/4.12/updating/updating-hardware-on-nodes-running-on-vsphere.html

Additional info:

Bug OCPBUGS-19346: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3021

Bug OCPBUGS-25601: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/aws-ebs-csi-driver/pull/251

Bug OCPBUGS-26210: LB not getting External-IP

View the Description View the linked PRs

This is a clone of issue OCPBUGS-25483. The following is the description of the original issue:
—
Description of problem:

A regression was identified creating LoadBalancer services in ARO in new 4.14 clusters (handled for new installations in OCPBUGS-24191)

The same regression has been also confirmed in ARO clusters upgraded to 4.14

Version-Release number of selected component (if applicable):

4.14.z

How reproducible:

On any ARO cluster upgraded to 4.14.z

Steps to Reproduce:

    1. Install an ARO cluster
    2. Upgrade to 4.14 from fast channel
    3. oc create svc loadbalancer test-lb -n default --tcp 80:8080

Actual results:

# External-IP stuck in Pending
$ oc get svc test-lb -n default
NAME      TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
test-lb   LoadBalancer   172.30.104.200   <pending>     80:30062/TCP   15m


# Errors in cloud-controller-manager being unable to map VM to nodes
$ oc logs -l infrastructure.openshift.io/cloud-controller-manager=Azure  -n openshift-cloud-controller-manager
I1215 19:34:51.843715       1 azure_loadbalancer.go:1533] reconcileLoadBalancer for service(default/test-lb) - wantLb(true): started
I1215 19:34:51.844474       1 event.go:307] "Event occurred" object="default/test-lb" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
I1215 19:34:52.253569       1 azure_loadbalancer_repo.go:73] LoadBalancerClient.List(aro-r5iks3dh) success
I1215 19:34:52.253632       1 azure_loadbalancer.go:1557] reconcileLoadBalancer for service(default/test-lb): lb(aro-r5iks3dh/mabad-test-74km6) wantLb(true) resolved load balancer name
I1215 19:34:52.528579       1 azure_vmssflex_cache.go:162] Could not find node () in the existing cache. Forcely freshing the cache to check again...
E1215 19:34:52.714678       1 azure_vmssflex.go:379] fs.GetNodeNameByIPConfigurationID(/subscriptions/fe16a035-e540-4ab7-80d9-373fa9a3d6ae/resourceGroups/aro-r5iks3dh/providers/Microsoft.Network/networkInterfaces/mabad-test-74km6-master0-nic/ipConfigurations/pipConfig) failed. Error: failed to map VM Name to NodeName: VM Name mabad-test-74km6-master-0
E1215 19:34:52.714888       1 azure_loadbalancer.go:126] reconcileLoadBalancer(default/test-lb) failed: failed to map VM Name to NodeName: VM Name mabad-test-74km6-master-0
I1215 19:34:52.714956       1 azure_metrics.go:115] "Observed Request Latency" latency_seconds=0.871261893 request="services_ensure_loadbalancer" resource_group="aro-r5iks3dh" subscription_id="fe16a035-e540-4ab7-80d9-373fa9a3d6ae" source="default/test-lb" result_code="failed_ensure_loadbalancer"
E1215 19:34:52.715005       1 controller.go:291] error processing service default/test-lb (will retry): failed to ensure load balancer: failed to map VM Name to NodeName: VM Name mabad-test-74km6-master-0

Expected results:

# The LoadBalancer gets an External-IP assigned
$ oc get svc test-lb -n default 
NAME         TYPE           CLUSTER-IP       EXTERNAL-IP                            PORT(S)        AGE 
test-lb      LoadBalancer   172.30.193.159   20.242.180.199                         80:31475/TCP   14s

Additional info:

In cloud-provider-config cm in openshift-config namespace, vmType=""

When vmType gets changed to "standard" explicitly, the provisioning of the LoadBalancer completes and an ExternalIP gets assigned without errors.

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/319

Bug OCPBUGS-4038: OKD: skip enabling gatewayd.socket

View the Description View the linked PRs

Description of problem:

OKD installer attempts to enable systemd-journal-gatewayd.socket, which is not present on FCOS

Version-Release number of selected component (if applicable):

4.13

Bug OCPBUGS-17676: Pod Logs in OpenShift Web Console do not maintain white-space

View the Description View the linked PRs

Description of problem:

When multiple consecutive spaces are present in Pod logs, the spaces are collapsed and white-space is not retained when reviewing logs via the OpenShift Web Console. The white-space is retained when reviewing via the 'raw' output and via the `oc logs` command but the white-space is collapsed when reviewing via the `logs` panel in the OpenShift Web Console. 

This mangles the output of tables in the logs.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Everytime

Steps to Reproduce:

1. Create a Pod which outputs a table in the logs
2. Review the output table in the Pod logs via the OpenShift Web Console

Actual results:

The spaces in the table are collapsed

Expected results:

The table formatting should be maintained

Additional info:

- During testing, I have added the `white-space:pre` styling for the log lines and this has resolved the white space issues. The styling of the logs do not appear to styled to retain the white-space formatting
- Tested on OCP 4.10.53 and 4.13.4 and both have the issue

https://github.com/openshift/console/pull/13101

Bug OCPBUGS-18969: SNO fails install because image-registry operator is degraded - "Degraded: The registry is removed..."

View the Description View the linked PRs

Description of problem:

While installing many SNOs via ZTP using ACM, two SNOs failed to complete install because the image-registry was degraded during the install process.

# cat clusters | xargs -I % sh -c "echo '%'; oc --kubeconfig /root/hv-vm/kc/%/kubeconfig get clusterversion"
vm01831
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       False         18h     Error while reconciling 4.14.0-rc.0: the cluster operator image-registry is degraded
vm02740
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       False         18h     Error while reconciling 4.14.0-rc.0: the cluster operator image-registry is degraded

# cat clusters | xargs -I % sh -c "echo '%'; oc --kubeconfig /root/hv-vm/kc/%/kubeconfig get co image-registry"
vm01831
NAME             VERSION       AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
image-registry   4.14.0-rc.0   True        False         True       18h     Degraded: The registry is removed...
vm02740
NAME             VERSION       AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
image-registry   4.14.0-rc.0   True        False         True       18h     Degraded: The registry is removed...

Both showed the image-pruner job pod in error state:
# cat clusters | xargs -I % sh -c "echo '%'; oc --kubeconfig /root/hv-vm/kc/%/kubeconfig get po -n openshift-image-registry"
vm01831
NAME                                               READY   STATUS    RESTARTS   AGE
cluster-image-registry-operator-5d497944d4-czn64   1/1     Running   0          18h
image-pruner-28242720-w6jmv                        0/1     Error     0          18h
node-ca-vtfj8                                      1/1     Running   0          18h
vm02740
NAME                                               READY   STATUS    RESTARTS      AGE
cluster-image-registry-operator-5d497944d4-lbtqw   1/1     Running   1 (18h ago)   18h
image-pruner-28242720-ltqzk                        0/1     Error     0             18h
node-ca-4fntj                                      1/1     Running   0             18h

Version-Release number of selected component (if applicable):

Deployed SNO OCP - 4.14.0-rc.0
Hub 4.13.11
ACM - 2.9.0-DOWNSTREAM-2023-09-07-04-47-52

How reproducible:

Rare, only 2 clusters were found in this state after the test

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Seems like some permissions might have been lacking:

# oc --kubeconfig /root/hv-vm/kc/vm01831/kubeconfig logs -n openshift-image-registry image-pruner-28242720-w6jmv
Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:openshift-image-registry:pruner" cannot list resource "pods" in API group "" at the cluster scope: RBAC: clusterrole.rbac.authorization.k8s.io "system:image-pruner" not found
attempt #1 has failed (exit code 1), going to make another attempt...
Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:openshift-image-registry:pruner" cannot list resource "pods" in API group "" at the cluster scope: RBAC: clusterrole.rbac.authorization.k8s.io "system:image-pruner" not found
attempt #2 has failed (exit code 1), going to make another attempt...
Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:openshift-image-registry:pruner" cannot list resource "pods" in API group "" at the cluster scope: RBAC: clusterrole.rbac.authorization.k8s.io "system:image-pruner" not found
attempt #3 has failed (exit code 1), going to make another attempt...
Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:openshift-image-registry:pruner" cannot list resource "pods" in API group "" at the cluster scope: RBAC: clusterrole.rbac.authorization.k8s.io "system:image-pruner" not found
attempt #4 has failed (exit code 1), going to make another attempt...
Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:openshift-image-registry:pruner" cannot list resource "pods" in API group "" at the cluster scope: RBAC: clusterrole.rbac.authorization.k8s.io "system:image-pruner" not found
attempt #5 has failed (exit code 1), going to make another attempt...
Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:openshift-image-registry:pruner" cannot list resource "pods" in API group "" at the cluster scope: RBAC: clusterrole.rbac.authorization.k8s.io "system:image-pruner" not found

Bug OCPBUGS-19125: Update 4.15 ose-cluster-authentication-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-authentication-operator/pull/634

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-authentication-operator/pull/634

Bug OCPBUGS-19185: Update 4.15 ose-aws-cluster-api-controllers image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-aws/pull/477

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-aws/pull/477

Bug OCPBUGS-9285: Documentation: Help Explain OpenShift Console List & Detail Resource Pages For Plugin Developers

View the Description View the linked PRs

The issue:

An interesting issue came up on #forum-ui-extensibility. There was an attempt to use extensions to nest a details page under a details page that contained a horizontal nav. This caused an issue with rendering the page content when a sub link was clicked – which caused confusion.

The why:

The reason this happened was the resource details page had a tab that contained a resource list page. This resource list page showed a number of items of CRs that when clicked would try to append their name onto the URL. This confused the navigation, thinking that this path must be another tab, so no tabs were selected and no content was visible. The goal was to reuse this longer path name as a details page of its own with its own horizontal nav. This issue is a conceptual misunderstanding of the way our list & details pages work in OpenShift Console.

List Pages are sometimes found via direct navigation links. List pages are almost all shown on the Search page, allowing a user to navigate to both existing nav items and other non-primary resources.

Details Pages are individual items found in the List Pages (a row). These are stand alone pages that show details of a singular CR and optionally can have tabs that list other resources – but they always transition to a fresh Details page instead of compounding on the currently visible one.

The ask:

If we could document this in a fashion that can help Plugin developers share the same UX that the rest of the Console does then we will have a more unified approach to UX within the Console and through any installed Plugins.

https://github.com/openshift/console/pull/13109

Bug OCPBUGS-26045: ART-8361: Replace genisoimage with xorriso in 4.15

View the Description View the linked PRs

This is duplicate of https://issues.redhat.com/browse/ART-8361 one since on ART bugs we are not able to set `target` so creating the issue here.

https://github.com/openshift/cluster-api-provider-libvirt/pull/272

Bug OCPBUGS-13206: 'customPythonDeploymentConfig' YAML script to create a Pipeline not working

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13149

Bug OCPBUGS-19199: Update 4.15 ose-alibaba-cloud-csi-driver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/alibaba-cloud-csi-driver/pull/33

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/alibaba-cloud-csi-driver/pull/33

Bug OCPBUGS-19262: Update 4.15 ose-cluster-image-registry-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-image-registry-operator/pull/918

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-image-registry-operator/pull/918

Task HOSTEDCP-1212: Bump Golang to v1.20

View the Description View the linked PRs

Bump Golang to v1.20

https://github.com/openshift/hypershift/pull/3038

Bug OCPBUGS-19223: Update 4.15 ose-csi-external-snapshotter image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-snapshotter/pull/105

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-snapshotter/pull/105

Bug OCPBUGS-26544: Ingress operator should use granular roles on GCP

View the Description View the linked PRs

Description of problem

The Ingress Operator should use granular roles in its CredentialsRequest per ~~CCO-249~~. A change to use granular roles merged after the release-4.15 branch cut. This change needs to be backported for 4.15.0.

Version-Release number of selected component (if applicable)

4.15.0

How reproducible

Easily.

Steps to Reproduce

1. Launch an OCP 4.15 cluster on GCP.
2. Check the ingress operator's CredentialsRequest: oc get -n openshift-cloud-credential-operator credentialsrequests/openshift-ingress-gcp -o yaml

Actual results

The CredentialsRequest uses a predefined role:

spec:
  providerSpec:
    apiVersion: cloudcredential.openshift.io/v1
    kind: GCPProviderSpec
    predefinedRoles:
    - roles/dns.admin

Expected results

The CredentialsRequest should specify the individual permissions that the operator requires:

spec:
  providerSpec:
    apiVersion: cloudcredential.openshift.io/v1
    kind: GCPProviderSpec
    permissions:
    - dns.changes.create
    - dns.resourceRecordSets.create
    - dns.resourceRecordSets.update
    - dns.resourceRecordSets.delete
    - dns.resourceRecordSets.list

Additional info

https://github.com/openshift/cluster-ingress-operator/pull/844 merged in the master branch for 4.16 and needs to be backported to the release-4.15 branch.

https://github.com/openshift/cluster-ingress-operator/pull/1015

Bug OCPBUGS-19746: Add a network validation to avoid overlapping when you define KAS Advertise Address

View the Description View the linked PRs

Description of problem:

When deploying a HostedCluster and you defines a KAS AdvertiseAddress, it could conflict with the current deployment overlapping with the other networks like Service, Cluster or Machine network, causing a deployment failure.

Version-Release number of selected component (if applicable):

latest

https://github.com/openshift/hypershift/pull/3047

Bug OCPBUGS-18649: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/708

Bug OCPBUGS-19203: Update 4.15 ose-cluster-baremetal-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-baremetal-operator/pull/362

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-baremetal-operator/pull/362

Bug OCPBUGS-5571: Update API docs content based on docs review

View the Description View the linked PRs

Michael Burke reviewed the plugin API documentation as part of https://github.com/openshift/openshift-docs/pull/53103. We should update the ts-doc comments in the openshift/console repo based on this review.

cc Olivia Payne Jakub Hadvig

https://github.com/openshift/console/pull/13108

Bug OCPBUGS-24089: Update 4.15 ose-powervs-machine-controllers-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-powervs/pull/66

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-powervs/pull/66

Bug OCPBUGS-16756: Cluster dropdown items are not marked for i18n when ACM/MCE installed

View the Description View the linked PRs

Description of problem:

After install ACM/MCE, there is dropdown list for switching cluster on the top masthead, the items in dropdown list are not marked for i18n, There is no translations for different languages.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-25-000711

How reproducible:

Always

Steps to Reproduce:

1.From operatorhub, install MCE operator and install required operand by default.
2.After refresh browser, check the translation for clusters dropdown list:"All Clusters/local-cluster".
3.

Actual results:

2. There are not marked for i18n, and don't have translation for different languages.

Expected results:

3. They should have translation for different languages.

Additional info:

https://github.com/openshift/console/pull/13238

Bug OCPBUGS-17757: GCP CLI authentication should only be allowed in manual mode

View the Description View the linked PRs

Description of problem:

Authenticate using the gcloud cli. The gcp credentials should no longer be using the data from osServiceAccount.json file. The installer should only allow installs to proceed when using Manual credentials mode.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Remove ~/.gcp/osServiceAccount.json
2. ensure that GOOGLE_APPLICATION_CREDENTIALS environment variable is not set.
3. gcloud auth application-default login.
4. Run the installer

Actual results:

Install succeeds

Expected results:

Install should fail noting the install mode is not Manual

Additional info:

https://github.com/openshift/installer/pull/7422

Bug OCPBUGS-25714: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-powervs-block-csi-driver-operator/pull/59

Bug OCPBUGS-17090: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-machine-approver/pull/200

Bug OCPBUGS-25161: Avoid eviction of CSI driver daemonsets pods from the cluster-autoscaler

View the Description View the linked PRs

This is a clone of issue OCPBUGS-23306. The following is the description of the original issue:
—
Related with https://issues.redhat.com/browse/OCPBUGS-23000

Cluster-autoscaler by default evict all those pods -including those coming from daemon sets-
In the case of EFS-CSI drivers, which are mounted as nfs volumes, this is causing nfs stale and that application worloads are not terminated gracefully.

Version-Release number of selected component (if applicable):

4.11

How reproducible:

- While scaling down a node from the cluster-autoscaler-operator, the DS pods are beeing evicted.

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

CSI pods might not be evicted by the cluster autoscaler (at least prior to workloads termination) as it might produce data corruption

Additional info:

Is possible to disable csi pods eviction adding the following annotation label on the csi driver pod
cluster-autoscaler.kubernetes.io/enable-ds-eviction: "false"

Bug OCPBUGS-21924: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story WRKLDS-908: Promote some experimental commands, deprecate others

View the Description View the linked PRs

Some commands have been here for so long and used regularly they are considered GA. Some commands are no longer that useful.

Bug OCPBUGS-24098: Update 4.15 ose-cluster-kube-apiserver-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-kube-apiserver-operator/pull/1589

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1589

Bug OCPBUGS-25648: vsphere-problem-detector-operator pod CrashLoopBackOff with panic

View the Description View the linked PRs

This is a clone of issue OCPBUGS-25372. The following is the description of the original issue:
—
Description of problem:

Find in QE's CI (with vsphere-agent profile), storage CO is not avaliable and vsphere-problem-detector-operator pod is CrashLoopBackOff with panic.
(Find must-garther here: https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-vsphere-agent-disconnected-ha-f14/1734850632575094784/artifacts/vsphere-agent-disconnected-ha-f14/gather-must-gather/)


The storage CO reports "unable to find VM by UUID":
  - lastTransitionTime: "2023-12-13T09:15:27Z"
    message: "VSphereCSIDriverOperatorCRAvailable: VMwareVSphereControllerAvailable:
      unable to find VM ci-op-782gwsbd-b3d4e-master-2 by UUID \nVSphereProblemDetectorDeploymentControllerAvailable:
      Waiting for Deployment"
    reason: VSphereCSIDriverOperatorCR_VMwareVSphereController_vcenter_api_error::VSphereProblemDetectorDeploymentController_Deploying
    status: "False"
    type: Available
(But I did not see the "unable to find VM by UUID" from vsphere-problem-detector-operator log in must-gather)


The vsphere-problem-detector-operator log:
2023-12-13T10:10:56.620216117Z I1213 10:10:56.620159       1 vsphere_check.go:149] Connected to vcenter.devqe.ibmc.devcluster.openshift.com as ci_user_01@devqe.ibmc.devcluster.openshift.com
2023-12-13T10:10:56.625161719Z I1213 10:10:56.625108       1 vsphere_check.go:271] CountVolumeTypes passed
2023-12-13T10:10:56.625291631Z I1213 10:10:56.625258       1 zones.go:124] Checking tags for multi-zone support.
2023-12-13T10:10:56.625449771Z I1213 10:10:56.625433       1 zones.go:202] No FailureDomains configured.  Skipping check.
2023-12-13T10:10:56.625497726Z I1213 10:10:56.625487       1 vsphere_check.go:271] CheckZoneTags passed
2023-12-13T10:10:56.625531795Z I1213 10:10:56.625522       1 info.go:44] vCenter version is 8.0.2, apiVersion is 8.0.2.0 and build is 22617221
2023-12-13T10:10:56.625562833Z I1213 10:10:56.625555       1 vsphere_check.go:271] ClusterInfo passed
2023-12-13T10:10:56.625603236Z I1213 10:10:56.625594       1 datastore.go:312] checking datastore /DEVQEdatacenter/datastore/vsanDatastore for permissions
2023-12-13T10:10:56.669205822Z panic: runtime error: invalid memory address or nil pointer dereference
2023-12-13T10:10:56.669338411Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x23096cb]
2023-12-13T10:10:56.669565413Z 
2023-12-13T10:10:56.669591144Z goroutine 550 [running]:
2023-12-13T10:10:56.669838383Z github.com/openshift/vsphere-problem-detector/pkg/operator.getVM(0xc0005da6c0, 0xc0002d3b80)
2023-12-13T10:10:56.669991749Z     github.com/openshift/vsphere-problem-detector/pkg/operator/vsphere_check.go:319 +0x3eb
2023-12-13T10:10:56.670212441Z github.com/openshift/vsphere-problem-detector/pkg/operator.(*vSphereChecker).enqueueSingleNodeChecks.func1()
2023-12-13T10:10:56.670289644Z     github.com/openshift/vsphere-problem-detector/pkg/operator/vsphere_check.go:238 +0x55
2023-12-13T10:10:56.670490453Z github.com/openshift/vsphere-problem-detector/pkg/operator.(*CheckThreadPool).worker.func1(0xc000c88760?, 0x0?)
2023-12-13T10:10:56.670702592Z     github.com/openshift/vsphere-problem-detector/pkg/operator/pool.go:40 +0x55
2023-12-13T10:10:56.671142070Z github.com/openshift/vsphere-problem-detector/pkg/operator.(*CheckThreadPool).worker(0xc000c78660, 0xc000c887a0?)
2023-12-13T10:10:56.671331852Z     github.com/openshift/vsphere-problem-detector/pkg/operator/pool.go:41 +0xe7
2023-12-13T10:10:56.671529761Z github.com/openshift/vsphere-problem-detector/pkg/operator.NewCheckThreadPool.func1()
2023-12-13T10:10:56.671589925Z     github.com/openshift/vsphere-problem-detector/pkg/operator/pool.go:28 +0x25
2023-12-13T10:10:56.671776328Z created by github.com/openshift/vsphere-problem-detector/pkg/operator.NewCheckThreadPool
2023-12-13T10:10:56.671847478Z     github.com/openshift/vsphere-problem-detector/pkg/operator/pool.go:27 +0x73

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-12-11-033133

How reproducible:

Steps to Reproduce:

    1. See description
    2.
    3.

Actual results:

   vpd is panic

Expected results:

   vpd should not panic

Additional info:

   I guess it is privileges issue, but our pod should not be panic.

https://github.com/openshift/vsphere-problem-detector/pull/149

Bug OCPBUGS-19352: Node in NotReady state as unified_cgroup_hierarchy=1 are set

View the Description View the linked PRs

Description of problem:

In baremetal multinode OCP cluster a node ends up in NotReady state.

On the node there are couple of failed services:
● cpuset-configure.service         loaded failed failed Move services to reserved cpuset
● on-prem-resolv-prepender.service loaded failed failed Populates resolv.conf according to on-prem IPI needs

journalctl --boot --no-pager -u cpuset-configure.service
Sep 18 16:57:37 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com systemd[1]: Starting Move services to reserved cpuset...
Sep 18 16:57:37 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com cpuset-configure.sh[3014]: /usr/local/bin/cpuset-configure.sh: line 17: /sys/fs/cgroup/cpuset/cpuset.sched_load_balance: Read-only file system
Sep 18 16:57:38 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com systemd[1]: cpuset-configure.service: Main process exited, code=exited, status=1/FAILURE
Sep 18 16:57:38 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com systemd[1]: cpuset-configure.service: Failed with result 'exit-code'.
Sep 18 16:57:38 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com systemd[1]: Failed to start Move services to reserved cpuset.

Sep 18 16:57:52 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com systemd[1]: Failed to start Populates resolv.conf according to on-prem IPI needs.
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com systemd[1]: Starting Populates resolv.conf according to on-prem IPI needs...
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4852]: nameserver 10.47.242.10
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4851]: NM resolv-prepender: Starting download of baremetal runtime cfg image
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4854]: Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:23012b3380ffce706aa8f204cdc26745d8a69b0218150ec3bcb495202694fdab...
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4854]: Getting image source signatures
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4854]: Copying blob sha256:916ead524b9e54b9d5534b65534253c02ce66f1d784e683389aa3c4cb4d12389
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4854]: Copying blob sha256:d8190195889efb5333eeec18af9b6c82313edd4db62989bd3a357caca4f13f0e
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4854]: Copying blob sha256:c71d2589fba7989ecd29ea120fe7add01fab70126fc653a863d5844e35ee5403
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4854]: Copying blob sha256:97da74cc6d8fa5d1634eb1760fd1da5c6048619c264c23e62d75f3bf6b8ef5c4
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4854]: Copying blob sha256:d4dc6e74b6ce09e24dc284cc1967451f3dda2d485bc92fc95d24d91f939e4849
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4854]: Copying config sha256:ba2c86ef11c4e341cd0870b6d5b7ad39aa39724389d9d2dfead4ea3d75582071
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4854]: Writing manifest to image destination
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4854]: Storing signatures
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4854]: ba2c86ef11c4e341cd0870b6d5b7ad39aa39724389d9d2dfead4ea3d75582071
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4851]: NM resolv-prepender: Download of baremetal runtime cfg image completed
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4863]: Your kernel does not support pids limit capabilities or the cgroup is not mounted. PIDs limit discarded.
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4863]: Error: OCI runtime error: runc: runc create failed: mountpoint for devices not found
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com systemd[1]: on-prem-resolv-prepender.service: Main process exited, code=exited, status=127/n/a

When checking CGroup config:

oc describe node.config
Name:         cluster
Namespace:
Labels:       <none>
Annotations:  include.release.openshift.io/ibm-cloud-managed: true
              include.release.openshift.io/self-managed-high-availability: true
              include.release.openshift.io/single-node-developer: true
              release.openshift.io/create-only: true
API Version:  config.openshift.io/v1
Kind:         Node
Metadata:
  Creation Timestamp:  2023-09-18T15:27:44Z
  Generation:          3
  Owner References:
    API Version:     config.openshift.io/v1
    Kind:            ClusterVersion
    Name:            version
    UID:             c62da215-6526-4306-8fc6-035612c8605e
  Resource Version:  91518
  UID:               cf2189ba-cd69-45e9-868c-7c2589decb25
Spec:
  Cgroup Mode:  v1
Events:         <none>

Version-Release number of selected component (if applicable):

4.14.0-rc.1

How reproducible:

so far 100%

Steps to Reproduce:

1. Deploy baremetal multinode cluster with GitOps-ZTP workflow
2.
3.

Actual results:

While all policies report Complaint state some configs are still being applied:

oc get mcp
NAME       CONFIG                                               UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
ht100gb    rendered-ht100gb-572f5aef443a21b21a8c5cfe816708e2    False     True       False      2              0                   0                     0                      77m
master     rendered-master-3c44ec28c389693028ad2cc6b74741ca     True      False      False      3              3                   3                     0                      103m
standard   rendered-standard-1942568110455a377b735e15f18c7ba8   True      False      False      2              2                   2                     0                      77m
worker     rendered-worker-033d4f0a2568efce241d02a2c54ab88e     True      False      False      0              0                   0                     0                      103m

Expected results:

All nodes are in Ready state

Additional info:

https://github.com/openshift/machine-config-operator/pull/3972

Bug OCPBUGS-21790: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-azure/pull/78

Bug OCPBUGS-18990: cluster-restore.sh does not move static pods back

View the Description View the linked PRs

Description of problem:

The script refactoring from https://github.com/openshift/cluster-etcd-operator/pull/1057 introduced a regression. 

Since the static pod list variable was renamed, it is now empty and won't restore the non-etcd pod yamls anymore.

Version-Release number of selected component (if applicable):

4.14 and later

How reproducible:

always

Steps to Reproduce:

1. create a cluster
2. restore using cluster-restore.sh

Actual results:

the apiserver and other static pods are not immediately restored

The script only outputs this log:

removing previous backup /var/lib/etcd-backup/member
Moving etcd data-dir /var/lib/etcd/member to /var/lib/etcd-backup
starting restore-etcd static pod

Expected results:

the non-etcd static pods should be immediately restored by moving them into the manifest directory again.

You can see this by the log output:

Moving etcd data-dir /var/lib/etcd/member to /var/lib/etcd-backup
starting restore-etcd static pod
starting kube-apiserver-pod.yaml
static-pod-resources/kube-apiserver-pod-7/kube-apiserver-pod.yaml
starting kube-controller-manager-pod.yaml
static-pod-resources/kube-controller-manager-pod-7/kube-controller-manager-pod.yaml
starting kube-scheduler-pod.yaml
static-pod-resources/kube-scheduler-pod-8/kube-scheduler-pod.yaml

Additional info:

https://github.com/openshift/cluster-etcd-operator/pull/1111

Bug OCPBUGS-21745: Azure CCM unable to manage Load Balancer in Azure Managed Identity Installs

View the Description View the linked PRs

Description of problem:

Upon installing 4.14.0-rc.6 in a cluster with private load balancer publishing and existing vnets Service type LoadBalancers lack permissions necessary to sync.

Version-Release number of selected component (if applicable):

4.14.0-rc.6

How reproducible:

Seemingly 100%

Steps to Reproduce:

1. Install w/ azure Managed Identity into an existing vnet with private LB publishing
2.
3.

Actual results:

                One or more other status conditions indicate a degraded state: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 403, RawError: {"error":{"code":"AuthorizationFailed","message":"The client '194d5669-cb47-4199-a673-4b32a4a110be' with object id '194d5669-cb47-4199-a673-4b32a4a110be' does not have authorization to perform action 'Microsoft.Network/virtualNetworks/subnets/read' over scope '/subscriptions/14b86a40-8d8f-4e69-abaf-42cbb0b8a331/resourceGroups/net/providers/Microsoft.Network/virtualNetworks/rnd-we-net/subnets/paas1' or the scope is invalid. If access was recently granted, please refresh your credentials."}}

Operators dependent on Ingress are failing as well.
authentication                             4.14.0-rc.6   False       False         True       149m    OAuthServerRouteEndpointAccessibleControllerAvailable: Get https://oauth-openshift.apps.cnb10161.rnd.westeurope.example.com/healthz: dial tcp: lookup oauth-openshift.apps.cnb10161.rnd.westeurope.example.com on 10.224.0.10:53: no such host (this is likely result of malfunctioning DNS server)
console                                    4.14.0-rc.6   False       True          False      142m    DeploymentAvailable: 0 replicas available for console deployment...

Expected results:

Successful install

Additional info:

The client ID in the error correspond to “openshift-cloud-controller-manager-azure-cloud-credentials” which indeed when checking its Azure managed identity only has access to cluster RG and not the network RG.

Additionally, they note that this permission is granted to the MAPI roles just not the CCM roles.

https://github.com/openshift/cloud-credential-operator/pull/607

Bug OCPBUGS-18971: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/k8s-prometheus-adapter/pull/76

Bug OCPBUGS-19970: Home-Projects-Default-workloads-AddPage: upload JAR file's i18n misses

View the Description View the linked PRs

Description of problem:

Change UI to non en_US locale
Navigate to Home - Projects - Default - Workloads - Add Page
Click on 'Upload JAR file'
"Browse" and "Clear" are in English
Please see reference screenshot

Version-Release number of selected component (if applicable):

4.14.0-rc.2

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Content is in English

Expected results:

Content should be localized

Additional info:

Reference screenshot
https://drive.google.com/file/d/1hgP_Rnkn4J4_gVC-T8pUUvAEiAWbfrJq/view?usp=drive_link

Bug OCPBUGS-22058: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7590

Bug OCPBUGS-23472: ignition-server-proxy deployment fails on y-stream upgrade 4.13->4.14

View the Description View the linked PRs

ignition-server-proxy pods fail to start after y-stream upgrade because the deployment is configured with a ServiceAccount, set in 4.13, that was deleted in 4.14 in PR https://github.com/openshift/hypershift/pull/2778. The 4.14 reconciliation does not unset the ServiceAccount that was set in 4.13.

https://github.com/openshift/hypershift/pull/3209

Bug OCPBUGS-19426: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13333

Bug OCPBUGS-22246: failure creating openstack LoadBalancer when workers include a provider Network secondary interface

View the Description View the linked PRs

Description of problem:

while creating a service with type:LoadBalancer and the workers include provider network secondary interfaces, CCM is complaining:

2023-10-21T04:16:47.399130931Z E1021 04:16:47.398913       1 controller.go:291] error processing service lb-tcp-verification-ns/lb-tcp-verification-svc (will retry): failed to ensure load balancer: failed when reconciling security groups for LB service lb-tcp-verification-ns/lb-tcp-verification-svc: failed to update security group for port 6d2389f1-5f47-4130-bb00-2dc61a6af1e4: Bad request with: [PUT https://overcloud.redhat.local:13696/v2.0/ports/6d2389f1-5f47-4130-bb00-2dc61a6af1e4], error message: {"NeutronError": {"type": "PortSecurityAndIPRequiredForSecurityGroups", "message": "Port security must be enabled and port must have an IP address in order to use security groups.", "detail": ""}}
2023-10-21T04:16:47.399130931Z I1021 04:16:47.399031       1 event.go:307] "Event occurred" object="lb-tcp-verification-ns/lb-tcp-verification-svc" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: failed when reconciling security groups for LB service lb-tcp-verification-ns/lb-tcp-verification-svc: failed to update security group for port 6d2389f1-5f47-4130-bb00-2dc61a6af1e4: Bad request with: [PUT https://overcloud.redhat.local:13696/v2.0/ports/6d2389f1-5f47-4130-bb00-2dc61a6af1e4], error message: {\"NeutronError\": {\"type\": \"PortSecurityAndIPRequiredForSecurityGroups\", \"message\": \"Port security must be enabled and port must have an IP address in order to use security groups.\", \"detail\": \"\"}}

Version-Release number of selected component (if applicable): 4.14 with OVN-K and OpenShiftSDN NetworkType. Issue is observed in 17.1 and 16.2 latest delivered puddles. This issue is a regression: it is not observed in <=4.13.
How reproducible: Always
Steps to Reproduce: Deploy cluster with provider network secondary interfaces and create a service with type:LoadBalancer.
Actual results: The service never become ready.
Expected results: The service is working as expected.
Additional info: Must-gather provided on private comment.

https://github.com/openshift/cloud-provider-openstack/pull/241

Bug OCPBUGS-18830: [AWS SC2S] ec2:DescribeSecurityGroupRules is not supported in SC2S region.

View the Description View the linked PRs

Description of problem:

Failed to install cluster on SC2S region as:

level=error msg=Error: reading Security Group (sg-0b0cd054dd599602f) Rules: UnsupportedOperation: The functionality you requested is not available in this region.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-09-11-201102

How reproducible:

Always

Steps to Reproduce:

1. Create an OCP cluster on SC2S

Actual results:

Install fail:
level=error msg=Error: reading Security Group (sg-0b0cd054dd599602f) Rules: UnsupportedOperation: The functionality you requested is not available in this region.

Expected results:

Install succeed.

Additional info:

* C2S region is not affected

https://github.com/openshift/installer/pull/7491

Bug OCPBUGS-13597: Failed to create STS resources in China regions using ccoctl

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cloud-credential-operator/pull/596

Bug OCPBUGS-17674: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openshift-apiserver/pull/387

Bug OCPBUGS-17877: One physical interface of a LACP bonding gets renamed after 4.12 to 4.13 upgrade

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/4020

Bug OCPBUGS-18852: Update 4.15 atomic-openshift-cluster-autoscaler image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kubernetes-autoscaler/pull/260

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kubernetes-autoscaler/pull/260

Task AGENT-670: Add cluster+host status and validation information from database into agent-gather logs

View the Description View the linked PRs

The validation and status shown from "wait-for bootstrap-complete" is sometimes inadequate or difficult to decipher because of the number of lines it prints out. The status and validation information is stored in the assisted-service database. agent-gather should query the database and log out the status/status_info columns for the cluster and hosts into a separate log file. A simple glance at this file would make triaging easier and faster.

https://github.com/openshift/installer/pull/7719

Bug OCPBUGS-19024: remove duplicate metric for techpreview featuregate

View the Description View the linked PRs

When moving the controller, the existing wasn't removed.

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1547

Bug OCPBUGS-24066: Update 4.15 atomic-openshift-cluster-autoscaler-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kubernetes-autoscaler/pull/270

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kubernetes-autoscaler/pull/270

Bug OCPBUGS-14053: Critical Alert Rules do not have runbook url

View the Description View the linked PRs

Description of problem:

Critical Alert Rules do not have runbook url

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

This bug is being raised by Openshift Monitoring team as part of effort to detect invalid Alert Rules in OCP.

1.  Check details of MultipleDefaultStorageClasses Alert Rule
2.
3.

Actual results:

The Alert Rule MultipleDefaultStorageClasses has Critical Severity, but does not have runbook_url annotation.

Expected results:

All Critical Alert Rules must have runbbok_url annotation

Additional info:

Critical Alerts must have a runbook, please refer to style guide at https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md#style-guide 

The runbooks are located at github.com/openshift/runbooks

To resolve the bug, 
- Add runbooks for the relevant Alerts at github.com/openshift/runbooks
- Add the link to the runbook in the Alert annotation 'runbook_url'
- Remove the exception in the origin test, added in PR https://github.com/openshift/origin/pull/27933

https://github.com/openshift/origin/pull/28042

Bug OCPBUGS-18690: [azure] Fail to provision bootstrap node with vm size in family standardEIBDSv5Family and standardEIBSv5Family

View the Description View the linked PRs

Description of problem:

In install-config.yaml, set controlplane type to size in vm family standardEIBDSv5Family and standardEIBSv5Family, get below error from installer when creating cluster
----------------------------
09-07 17:55:57.613  level=error msg=Error: creating Linux Virtual Machine: (Name "jima-test-wlgrr-bootstrap" / Resource Group "jima-test-wlgrr-rg"): compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="InvalidParameter" Message="The VM size 'Standard_E112ibs_v5' cannot boot with OS image or disk. Please check that disk controller types supported by the OS image or disk is one of the supported disk controller types for the VM size 'Standard_E112ibs_v5'. Please query sku api at https://aka.ms/azure-compute-skus  to determine supported disk controller types for the VM size." Target="vmSize"

Checked that both vm families only support diskControllerTypes NVMe
      {
        "name": "DiskControllerTypes",
        "value": "NVMe"
      },

From https://github.com/hashicorp/terraform-provider-azurerm/issues/22058, seems that it does not support to set disk controller types.

Suggest to add validation for those family as what is done in https://github.com/openshift/installer/pull/6733

Version-Release number of selected component (if applicable):

4.14 nightly build

How reproducible:

always

Steps to Reproduce:

1. prepare install-config, set vm size in family standardEIBDSv5Family and standardEIBSv5Family for controlplane
2. create cluster
3.

Actual results:

Installer failed with error

Expected results:

Installer should have pre-check for those unsupported instance types and exit with error message

Additional info:

https://github.com/openshift/installer/pull/7500

Bug OCPBUGS-23539: Bogus warning message when creating manifests

View the Description View the linked PRs

Description of problem:

Creating the installation manifests results in a bogus warning message about discarding existing manifests, even though none exist.

Version-Release number of selected component (if applicable):

Tested on 4.15 dev, but the problem appears to have been present since 4.2.

How reproducible:

100%

Steps to Reproduce:

1. Start with an empty dir containing only an install-config.yaml with platform: baremetal
2. Run "openshift-install create manifests"
3. There is no step 3

Actual results:

INFO Consuming Install Config from target directory 
WARNING Discarding the Openshift Manifests that was provided in the target directory because its dependencies are dirty and it needs to be regenerated 
INFO Manifests created in: test/manifests and test/openshift

Expected results:

INFO Consuming Install Config from target directory           
INFO Manifests created in: test/manifests and test/openshift

Additional info:

The issue is due to multiple assets referencing the same files.

https://github.com/openshift/installer/pull/7753

Bug OCPBUGS-18945: [4.15] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.15. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-12868~~.

https://github.com/openshift/installer/pull/7499

Bug OCPBUGS-19231: Update 4.15 ose-cluster-openshift-apiserver-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-openshift-apiserver-operator/pull/548

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-openshift-apiserver-operator/pull/548

Bug OCPBUGS-26206: cluster-monitoring-operator watches on metal-ipi are higher

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-26069~~. The following is the description of the original issue:
—
Component Readiness has found a potential regression in [sig-arch][Late] operators should not create watch channels very often [apigroup:apiserver.openshift.io] [Suite:openshift/conformance/parallel].

Probability of significant regression: 98.46%

Sample (being evaluated) Release: 4.15
Start Time: 2023-12-29T00:00:00Z
End Time: 2024-01-04T23:59:59Z
Success Rate: 83.33%
Successes: 15
Failures: 3
Flakes: 0

Base (historical) Release: 4.14
Start Time: 2023-10-04T00:00:00Z
End Time: 2023-10-31T23:59:59Z
Success Rate: 98.36%
Successes: 120
Failures: 2
Flakes: 0

View the test details report at https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&arch=amd64&baseEndTime=2023-10-31%2023%3A59%3A59&baseRelease=4.14&baseStartTime=2023-10-04%2000%3A00%3A00&capability=Other&component=Unknown&confidence=95&environment=sdn%20no-upgrade%20amd64%20metal-ipi%20serial&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=sdn&network=sdn&pity=5&platform=metal-ipi&platform=metal-ipi&sampleEndTime=2024-01-04%2023%3A59%3A59&sampleRelease=4.15&sampleStartTime=2023-12-29%2000%3A00%3A00&testId=openshift-tests%3A9ff4e9b171ea809e0d6faf721b2fe737&testName=%5Bsig-arch%5D%5BLate%5D%20operators%20should%20not%20create%20watch%20channels%20very%20often%20%5Bapigroup%3Aapiserver.openshift.io%5D%20%5BSuite%3Aopenshift%2Fconformance%2Fparallel%5D&upgrade=no-upgrade&upgrade=no-upgrade&variant=serial&variant=serial

https://github.com/openshift/origin/pull/28506

Bug OCPBUGS-18862: Update 4.15 ironic-agent image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ironic-agent-image/pull/88

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ironic-agent-image/pull/88

Bug OCPBUGS-21637: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-27035: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-shared-resource-operator/pull/96

Bug OCPBUGS-19216: Update 4.15 coredns image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/coredns/pull/95

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/coredns/pull/95

Bug OCPBUGS-19123: Update 4.15 ose-cloud-credential-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-credential-operator/pull/600

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-credential-operator/pull/600

Bug OCPBUGS-19130: Update 4.15 ose-installer image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/installer/pull/7493

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/installer/pull/7493

Bug OCPBUGS-19132: Update 4.15 csi-livenessprobe image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-livenessprobe/pull/47

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-livenessprobe/pull/47

Bug OCPBUGS-24610: Update 4.15 ose-ovn-kubernetes-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ovn-kubernetes/pull/1963

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ovn-kubernetes/pull/1963

Bug OCPBUGS-26936: Image registry operator does not support new PowerVS regions

View the Description View the linked PRs

This is a clone of issue OCPBUGS-26767. The following is the description of the original issue:
—
Description of problem:

[inner hamzy@li-3d08e84c-2e1c-11b2-a85c-e2db7bb078fc hamzy-release]$ oc get co/image-registry
NAME             VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
image-registry             False       True          True       50m     Available: The deployment does not exist...
[inner hamzy@li-3d08e84c-2e1c-11b2-a85c-e2db7bb078fc hamzy-release]$ oc describe co/image-registry
...
    Message:               Progressing: Unable to apply resources: unable to sync storage configuration: cos region corresponding to a powervs region wdc not found
...

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-ppc64le-2024-01-10-083055

How reproducible:

Always

Steps to Reproduce:

    1. Deploy a PowerVS cluster in wdc06 zone

Actual results:

See above error message

Expected results:

Cluster deploys

https://github.com/openshift/cluster-image-registry-operator/pull/988

Bug OCPBUGS-13204: customNodeDeployment YAML script not working

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13148

Bug OCPBUGS-19268: Update 4.15 ose-cluster-ingress-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-ingress-operator/pull/977

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-ingress-operator/pull/977

Bug OCPBUGS-21878: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-powervs/pull/53

Bug OCPBUGS-17641: [Multi-NIC]EgressIP was not added to secondary NIC on egress node after apply the configuration

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

pre-merge testing or  4.14.0-0.nightly-2023-08-20-085537

How reproducible:

Always

Steps to Reproduce:

1. Label one worker node as egress node and enable ipforarding on it
2. Create an egressip object, it can be assigned to egress node
oc get egressip
NAME         EGRESSIPS      ASSIGNED NODE                         ASSIGNED EGRESSIPS
egressip-1   172.22.0.100   worker-2.sriov.openshift-qe.sdn.com   172.22.0.100

oc get egressip -o yaml
apiVersion: v1
items:
- apiVersion: k8s.ovn.org/v1
  kind: EgressIP
  metadata:
    creationTimestamp: "2023-08-11T03:46:19Z"
    generation: 7
    name: egressip-1
    resourceVersion: "169277"
    uid: 7692bea5-c072-41e5-aa7a-acfa737a5428
  spec:
    egressIPs:
    - 172.22.0.100
    namespaceSelector:
      matchLabels:
        name: qe
  status:
    items:
    - egressIP: 172.22.0.100
      node: worker-2.sriov.openshift-qe.sdn.com
kind: List
metadata:
  resourceVersion: ""

3. Create a namespace test and some pods on it. add a label to namespace matching egressIP object.
4. From pod to access the bastion host

Actual results:

Outgoing traffic was timeout

From bastion node,it didn't get correct MAC for egressIP
? (172.22.0.100) at <incomplete> on sriovpr

egressIP was not added to secondary NIC on egress node
 oc debug node/worker-2.sriov.openshift-qe.sdn.com
Temporary namespace openshift-debug-crpt9 is created for debugging node...
Starting pod/worker-2sriovopenshift-qesdncom-debug-s857l ...
To use host binaries, run `chroot /host`
Pod IP: 192.168.111.25
If you don't see a command prompt, try pressing enter.
sh-4.4# ip a show enp1s0
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:32:ca:4e:a8:bf brd ff:ff:ff:ff:ff:ff
    inet 172.22.0.50/24 scope global enp1s0
       valid_lft forever preferred_lft forever
    inet6 fd00:1101::65fe:9a70:ab40:4c1a/128 scope global dynamic noprefixroute 
       valid_lft 85269sec preferred_lft 85269sec
    inet6 fe80::232:caff:fe4e:a8bf/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

Expected results:

EgressIP works well on secondary NIC

Additional info:

https://github.com/openshift/ovn-kubernetes/pull/1907

Bug OCPBUGS-19218: Update 4.15 ose-aws-cloud-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-aws/pull/48

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-aws/pull/48

Bug OCPBUGS-19550: Limit multus pod watch to pods on the local node

View the Description View the linked PRs

Multus doesn't need to watch pods on other nodes. To save memory and CPU set MULTUS_NODE_NAME to filter pods that multus watches.

https://github.com/openshift/cluster-network-operator/pull/2020

Bug OCPBUGS-21642: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-capi-operator/pull/132

Bug OCPBUGS-23768: After PatternFly5 update: Navigation: Extra space after divider

View the Description View the linked PRs

Issue 33 from https://docs.google.com/spreadsheets/d/1TR3ENY-GE_LQL9F-xH6NahtRHdu6IrYGhLqDDA8y0EI/edit#gid=1035185624

In left navigation menu in dev perspective, after divider, there is extra space.

Screenshot: https://drive.google.com/file/d/1ROcHXCLmPPhr30nGTUblMTL-JQqKEsCY/view?usp=drive_link

https://github.com/openshift/console/pull/13362

Bug OCPBUGS-19286: Update 4.15 ose-installer-artifacts image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/installer/pull/7496

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/installer/pull/7496

Bug OCPBUGS-23126: 'hcp destroy' command can leave HostedCluster hanging indefinitely during cleanup

View the Description View the linked PRs

Description of problem:

A user destroying a HostedCluster can cause the HostedCluster to hang indefinitely if the destroy command times out during execution

This is due to the hcp cli placing a finalizer on the HostedCluster during deletion which the cli tool later removes after waiting for some clean up actions to occur. If a user cancels the `hcp destroy cluster` command (or the command times out) while the cli is waiting for cleanup, then the HostedCluster will hang indefinitely with a DeletionTimestamp != nil.

The cli tool should not be putting the HostedCluster into an un-reconcilable state. All this finializer cleanup logic belongs on the backend.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

100%

Steps to Reproduce:

1. create an hcp cluster
2. destroy the hcp cluster with the cli tool and immediately abort the cli process
3.

Actual results:

HostedCluster is stuck indefinitely during deletion

Expected results:

HostedCluster is able to delete despite the cli being cancelled.

Additional info:

related to https://access.redhat.com/support/cases/#/case/03660218

https://github.com/openshift/hypershift/pull/3234

Bug OCPBUGS-25596: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/888

Bug OCPBUGS-18876: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7513

Bug OCPBUGS-19095: Update 4.15 ose-cluster-olm-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-olm-operator/pull/31

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-olm-operator/pull/31

Bug OCPBUGS-21621: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-21744: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-config-operator/pull/366

Bug OCPBUGS-15538: Address manager primary node IP constantly being "updated"

View the Description View the linked PRs

Description of problem:

It was seen in downstream and upstream that ovn-controller was constantly restarting. This was due to ovnkube-node telling it to exit after it thought that the encap IP (the primary node IP) had changed.

This has been mitigated by:
https://github.com/ovn-org/ovn-kubernetes/pull/3711

But we still need to know why the c.nodePrimaryAddrChanged() function is returning true when nothing is really changing on the node. Example after the fix above:

ovn-control-plane/ovn-kubernetes/ovnkube.log:I0627 22:37:02.020612 1670 node_ip_handler_linux.go:212] Node primary address changed to 172.18.0.3. Updating OVN encap IP.
ovn-control-plane/ovn-kubernetes/ovnkube.log:I0627 22:37:02.037852 1670 node_ip_handler_linux.go:343] Will not update encap IP, value: 172.18.0.3 is the already configured
ovn-control-plane/ovn-kubernetes/ovnkube.log:I0627 23:03:03.115881 16698 node_ip_handler_linux.go:212] Node primary address changed to 172.18.0.3. Updating OVN encap IP.
ovn-control-plane/ovn-kubernetes/ovnkube.log:I0627 23:03:03.122365 16698 node_ip_handler_linux.go:343] Will not update encap IP, value: 172.18.0.3 is the already configured
ovn-control-plane/ovn-kubernetes/ovnkube.log:I0627 23:18:08.381694 27220 node_ip_handler_linux.go:212] Node primary address changed to 172.18.0.3. Updating OVN encap IP.
ovn-control-plane/ovn-kubernetes/ovnkube.log:I0627 23:18:08.389655 27220 node_ip_handler_linux.go:343] Will not update encap IP, value: 172.18.0.3 is the already configured
ovn-control-plane/ovn-kubernetes/ovnkube.log:I0627 23:19:26.638221 28746 node_ip_handler_linux.go:212] Node primary address changed to 172.18.0.3. Updating OVN encap IP.
ovn-control-plane/ovn-kubernetes/ovnkube.log:I0627 23:19:26.644217 28746 node_ip_handler_linux.go:343] Will not update encap IP, value: 172.18.0.3 is the already configured

This can be observed in kind deployments as well.

Version-Release number of selected component (if applicable):

Could affect versions earlier than 4.14

https://github.com/openshift/ovn-kubernetes/pull/1935

Bug OCPBUGS-17041: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-olm/pull/556

Bug OCPBUGS-18089: Don't set SSL connection on DBs anymore with OVN-IC

View the Description View the linked PRs

SB and NB containers have this command to expose their DB via SSL and set the inactivity probe interval. With OVN-IC we don't use SSL for the DBs anymore, so we can remove that bit.

if ! retry 60 "inactivity-probe" "ovn-sbctl --no-leader-only -t 5 set-connection pssl:.OVN_SB_PORT.LISTEN_DUAL_STACK – set connection . inactivity_probe=.OVN_CONTROLLER_INACTIVITY_PROBE"; then

should become:

if ! retry 60 "inactivity-probe" "ovn-sbctl --no-leader-only -t 5 set connection . inactivity_probe=.OVN_CONTROLLER_INACTIVITY_PROBE"; then

Also we can clean up the comment at the end where it polls the IPsec status, which is just a way of making sure the DB is ready and answering queries. We dont' need to wait for the cluster to converge (since there's no RAFT) but could change it to:

"Kill some time while DB becomes ready by checking IPsec status"

Bug OCPBUGS-25460: Private endpoint creation does not work on cluster created with minimal permissions

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-image-registry-operator/pull/978

Bug OCPBUGS-19282: Update 4.15 ose-tools image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/oc/pull/1545

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/oc/pull/1545

Bug OCPBUGS-22541: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/alibaba-disk-csi-driver-operator/pull/70

Bug OCPBUGS-17987: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-authentication-operator/pull/629

Bug OCPBUGS-18469: Azure Image Registry Operator Making too Many Storage Account List Calls

View the Description View the linked PRs

Description of problem:

The image registry operator in Azure by default has two replicas.  Every 5 minutes, each of those replicas makes a call to the StorageAccount List operation for the image registry storage account.  

Azure has published limits for storage account throttling operations.  These limits are 100 calls to list operations every 5 minutes based on the subscription & region pair that exists. 

Because of this, customers are limited to <50 clusters per subscription and region in Azure.  This number can change based on the number of image registry replicas as well as customer activity on List storage account operations within that subscription and region.  

On Azure Red Hat OpenShift managed service, we occasionally have customers exceeding these limits including internal customers for demos, preventing them from creating new clusters within the subscription & region due to these scaling limits.

Version-Release number of selected component (if applicable):

N/A

How reproducible:

Always.

Steps to Reproduce:

1. Scale up the number of image registry pods to hit the 100 / 5 minute List limit (50 replicas, or enough clusters within a given subscription & region)
2. Attempt to create a new cluster
3. Cluster installation may fail due to image-registry cluster operator never going healthy, or the installer not being able to generate a storage account key for the bootstrap node to fetch its ignition config.

Actual results:

storage.AccountsClient#ListAccountSAS: Failure responding to request: StatusCode=429 -- Original Error: autorest/azure: Service returned an error. Status=429 Code="TooManyRequests" Message="The request is being throttled as the limit has been reached for operation type - Read_ObservationWindow_00:05:00. For more information, see - https://aka.ms/srpthrottlinglimits"

Expected results:

Cluster installs successfully

Additional info:

Raising this as a bug since this issue will be persistent across all cluster installations should one exceed the threshold.  It will also impact the image-registry pod health.

https://github.com/openshift/cluster-image-registry-operator/pull/912

Bug OCPBUGS-18846: Update 4.15 golang-github-prometheus-alertmanager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prometheus-alertmanager/pull/75

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-20499: openshift-gcp-routes.sh exits prematurely, causing critical systemd service restarts

View the Description View the linked PRs

This test triggers failures shortly after node reboot. Of course the node isn't ready, it rebooted.

: [sig-node] nodes should not go unready after being upgraded and go unready only once

{ 1 nodes violated upgrade expectations: Node ci-op-q38yw8yd-8aaeb-lsqxj-master-0 went unready multiple times: 2023-10-11T21:58:45Z, 2023-10-11T22:05:45Z Node ci-op-q38yw8yd-8aaeb-lsqxj-master-0 went ready multiple times: 2023-10-11T21:58:46Z, 2023-10-11T22:07:18Z }

Both of those times, the master-0 was rebooted or being rebooted.

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/2060/pull-ci-openshift-cluster-network-operator-master-e2e-gcp-ovn-upgrade/1712203703311667200

https://github.com/openshift/machine-config-operator/pull/3977

Bug OCPBUGS-24157: Update 4.15 ose-vsphere-cloud-controller-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-vsphere/pull/58

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-vsphere/pull/58

Bug OCPBUGS-24781: Update 4.16 ironic-agent-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ironic-agent-image/pull/97

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ironic-agent-image/pull/97

Bug OCPBUGS-16597: A Master Machine is stuck in deleting state after replacing the network by a wrong one in CPMS and updating it back

View the Description View the linked PRs

Description of problem:

After updating a CPMS CR with a non-existent network a machine is stuck in provisioning state.
The when updating the CPMS with the previous one the Master Machine is stuck in deleting state 

Logs from the machine api controller:
I0720 13:03:58.894171       1 controller.go:187] ostest-2pwfk-master-xwprn-0: reconciling Machine
I0720 13:03:58.902876       1 controller.go:231] ostest-2pwfk-master-xwprn-0: reconciling machine triggers delete
E0720 13:04:00.200290       1 controller.go:255] ostest-2pwfk-master-xwprn-0: failed to delete machine: filter matched no resources
E0720 13:04:00.200499       1 controller.go:329]  "msg"="Reconciler error" "error"="filter matched no resources" "controller"="machine-controller" "name"="ostest-2pwfk-master-xwprn-0" "namespace"="openshift-machine-api" "object"={"name":"ostest-2pwfk-master-xwprn-0","namespace":"openshift-machine-api"} "reconcileID"="9ccb5885-4b9f-4190-95a2-1120f2566c52"

Version-Release number of selected component (if applicable):

OCP 4.14.0-0.nightly-2023-07-18-085740
RHOS-17.1-RHEL-9-20230712.n.1

How reproducible:

100%

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-api-provider-openstack/pull/90

Bug OCPBUGS-19080: SNO failed upgrade (4.13-> 4.14) because console operator is not available

View the Description View the linked PRs

Description of problem:

Attempted upgrade of 3480 SNOs that were deployed from 4.13.11 to 4.14.0-rc.0 and 15 SNOs ended up stuck in partial upgrade because the cluster console operator was not available

# cat 4.14.0-rc.0-partial.console | xargs -I % sh -c "echo -n '% '; oc --kubeconfig /root/hv-vm/kc/%/kubeconfig get clusterversion --no-headers"
vm00255 version   4.13.11   True   True   21h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm00320 version   4.13.11   True   True   21h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm00327 version   4.13.11   True   True   21h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm00405 version   4.13.11   True   True   21h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm00705 version   4.13.11   True   True   21h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm01224 version   4.13.11   True   True   19h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm01310 version   4.13.11   True   True   19h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm01320 version   4.13.11   True   True   19h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm01928 version   4.13.11   True   True   19h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm02052 version   4.13.11   True   True   19h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm02588 version   4.13.11   True   True   17h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm02704 version   4.13.11   True   True   17h   Unable to apply 4.14.0-rc.0: wait has exceeded 40 minutes for these operators: console
vm02835 version   4.13.11   True   True   17h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm03110 version   4.13.11   True   True   15h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm03322 version   4.13.11   True   True   15h   Unable to apply 4.14.0-rc.0: wait has exceeded 40 minutes for these operators: console

Version-Release number of selected component (if applicable):

SNO OCP (managed clusters being upgraded) 4.13.11 upgraded to 4.14.0-rc.0
Hub OCP 4.13.12
ACM - 2.9.0-DOWNSTREAM-2023-09-07-04-47-52

How reproducible:

15 out of 3489 SNos being upgraded however represented 15 out of the 41 partial upgrade failures group (~36% of the failures)

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console-operator/pull/796

Bug OCPBUGS-23528: hypershift destroy command fails when removing destroy finalizer

View the Description View the linked PRs

Description of problem:

Attempting to destroy an AWS cluster can result in an error such as:

2023-11-21T15:04:15Z	INFO	Deleted role	{"role": "53375835bafc21240c89-mgmt-worker-role"}
2023-11-21T15:04:15Z	INFO	Deleting Secrets	{"namespace": "clusters"}
2023-11-21T15:04:15Z	INFO	Deleted CLI generated secrets
2023-11-21T15:04:15Z	ERROR	Failed to destroy cluster	{"error": "failed to remove finalizer: HostedCluster.hypershift.openshift.io \"53375835bafc21240c89-mgmt\" is invalid: metadata.finalizers: Forbidden: no new finalizers can be added if the object is being deleted, found new finalizers []string{\"hypershift.io/aws-oidc-discovery\"}"}
github.com/spf13/cobra.(*Command).execute
	/hypershift/vendor/github.com/spf13/cobra/command.go:916
github.com/spf13/cobra.(*Command).ExecuteC
	/hypershift/vendor/github.com/spf13/cobra/command.go:1044
github.com/spf13/cobra.(*Command).Execute
	/hypershift/vendor/github.com/spf13/cobra/command.go:968
github.com/spf13/cobra.(*Command).ExecuteContext
	/hypershift/vendor/github.com/spf13/cobra/command.go:961
main.main
	/hypershift/main.go:70
runtime.main
	/usr/local/go/src/runtime/proc.go:250
Error: failed to remove finalizer: HostedCluster.hypershift.openshift.io "53375835bafc21240c89-mgmt" is invalid: metadata.finalizers: Forbidden: no new finalizers can be added if the object is being deleted, found new finalizers []string{"hypershift.io/aws-oidc-discovery"}
failed to remove finalizer: HostedCluster.hypershift.openshift.io "53375835bafc21240c89-mgmt" is invalid: metadata.finalizers: Forbidden: no new finalizers can be added if the object is being deleted, found new finalizers []string{"hypershift.io/aws-oidc-discovery"}

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Occasionally

Steps to Reproduce:

1. create hosted AWS cluster
2. destroy cluster with `hypershift destroy cluster aws`

Actual results:

In some cases, the destroy will fail with the message in the description

Expected results:

The destroy does not fail while removing the destroy finalizer

Additional info:

https://github.com/openshift/hypershift/pull/3219

Bug OCPBUGS-27071: HCP does not deploy cloud provider kubevirt with configured node selectors

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25696~~. The following is the description of the original issue:
—
Description of problem:

    When deploying a HCP KubeVirt cluster using the hcp's --node-selector cli arg, that node selector is not applied to the "kubevirt-cloud-controller-manager" pods within the HCP namespace. 

This makes it not possible to pin the entire HCP pods to specific nodes.

Version-Release number of selected component (if applicable):

    4.14

How reproducible:

    100%

Steps to Reproduce:

    1. deploy an hcp kubevirt cluster with the --node-selector cli option
    2.
    3.

Actual results:

    the node selector is not applied to cloud provider kubevirt pod

Expected results:

    the node selector should be applied to cloud provider kubevirt pod.

Additional info:

https://github.com/openshift/hypershift/pull/3417

Bug OCPBUGS-27192: Remove NCv2 series from azure doc tested_instance_types_x86_64

View the Description View the linked PRs

Description of problem:

Based on Azure doc [1], NCv2 series Azure virtual machines (VMs) are retired on September 6, 2023. VM could not be provisioned on those instance types.

So remove standardNCSv2Family from azure doc tested_instance_types_x86_64 on 4.13+.

[1] https://learn.microsoft.com/en-us/azure/virtual-machines/ncv2-series

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. cluster is installed failed on NCv2 series instance type 
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7911

Bug OCPBUGS-18841: OCP-57089 and OCP-24504 failed in 4.14 azure platform for the load-balancer service couldn't get an external-IP address

View the Description View the linked PRs

Description of problem:

Failed to run auto OCP-57089 on a 4.14 azure platform, manually checked it, the created load-balancer service couldn't get an external-IP address

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-09-09-164123

How reproducible:

100% on the cluster

Steps to Reproduce:

1. Add a wait in the auto script, then run the case
      g.By("check if the lb services have obtained the EXTERNAL-IPs")
      regExp := "([0-9]+.[0-9]+.[0-9]+.[0-9]+)"
      time.Sleep(3600 * time.Second) 
% ./bin/extended-platform-tests run all --dry-run | grep 57089 | ./bin/extended-platform-tests run -f -

2.
% oc get ns | grep e2e-test-router
e2e-test-router-ingressclass-n2z2c                 Active   2m51s 

3. It was pending in EXTERNAL-IP column for internal-lb-57089 service
% oc -n e2e-test-router-ingressclass-n2z2c get svc
NAME                TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)           AGE
external-lb-57089   LoadBalancer   172.30.198.7    20.42.34.61   28443:30193/TCP   3m6s
internal-lb-57089   LoadBalancer   172.30.214.30   <pending>     29443:31507/TCP   3m6s
service-secure      ClusterIP      172.30.47.70    <none>        27443/TCP         3m13s
service-unsecure    ClusterIP      172.30.175.59   <none>        27017/TCP         3m13s
% 

4.
% oc -n e2e-test-router-ingressclass-n2z2c get svc internal-lb-57089 -oyaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/azure-load-balancer-internal: "true"
  creationTimestamp: "2023-09-12T07:56:42Z"
  finalizers:
  - service.kubernetes.io/load-balancer-cleanup
  name: internal-lb-57089
  namespace: e2e-test-router-ingressclass-n2z2c
  resourceVersion: "209376"
  uid: b163bc03-b1c6-4e7b-b4e1-c996e9d135f4
spec:
  allocateLoadBalancerNodePorts: true
  clusterIP: 172.30.214.30
  clusterIPs:
  - 172.30.214.30
  externalTrafficPolicy: Cluster
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: https
    nodePort: 31507
    port: 29443
    protocol: TCP
    targetPort: 8443
  selector:
    name: web-server-rc
  sessionAffinity: None
  type: LoadBalancer
status:
  loadBalancer: {}
%

Actual results:

internal-lb-57089 service couldn't get an external-IP address

Expected results:

internal-lb-57089 service can get an external-IP address

Additional info:

Bug OCPBUGS-20338: New Feature in 4.14 - Node Dashboard in OCP

View the Description View the linked PRs

Description of problem:

In 4.14 RHOCP version, New feature that is Node dashboard is not showing expected metric/dashboard data.

[hjaiswal@hjaiswal 4_14]$ oc get nodes
NAME                                             STATUS     ROLES                  AGE     VERSION
ip-10-0-26-232.ap-southeast-1.compute.internal   Ready      control-plane,master   6h12m   v1.27.6+1648878
ip-10-0-42-100.ap-southeast-1.compute.internal   Ready      control-plane,master   6h12m   v1.27.6+1648878
ip-10-0-46-197.ap-southeast-1.compute.internal   Ready      worker                 6h3m    v1.27.6+1648878
ip-10-0-66-225.ap-southeast-1.compute.internal   NotReady   worker                 6h3m    v1.27.6+1648878
ip-10-0-8-20.ap-southeast-1.compute.internal     Ready      worker                 6h5m    v1.27.6+1648878
ip-10-0-80-84.ap-southeast-1.compute.internal    Ready      control-plane,master   6h12m   v1.27.6+1648878

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Steps to Reproduce:

1. Check whether all the nodes are in ready state. (cluster version 4.14)
2. ssh/debug to any worker node.
3. Stop the kubelet service. 
4. check whether node went into notready state.
5. Open openshift console and goto observe--> dashboard ---> then select new feature that is "Node cluster".
6. Its showing "0" nodes in notready state but it should display "1" node in notready state.

Actual results:

In Node cluster there is no count for not ready node.

Expected results:

In Node cluster the notready node should be 1

Additional info:

Tested in AWS IPI cluster

https://github.com/openshift/machine-config-operator/pull/3964

Bug HYPBLD-99: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2997

Bug OCPBUGS-129: [OCP web console] Unable to select/change log component under master node's logs section once user made any selection.

View the Description View the linked PRs

Description of problem:

Once a user makes a change to the log component from master node's log section, then the user is unable to change or select a different log component from the dropdown.

To make different log component selection , the user needs to revisit the logs section under master node again and this refreshes the pane and reloads to default options.

Version-Release number of selected components (if applicable):

4.11.0-0.nightly-2022-08-15-152346

How reproducible:

Always

Steps to Reproduce:

Login to OCP web console.
Go to Compute > Nodes > Click on one of the master nodes.
Go to the Logs section.
Change the dropdown value from journal to openshift-apiserver ( also select audit log)
Try to change the dropdown value from openshift-apiserver to journal/kube-apiserver/oauth-apiserver.
View the behavior.

Actual results:

Unable to select or change the log component once the user already made a selection from the dropdown under master nodes' logs section.

Expected results:

Users should be allowed to change/select the log component from master node's logs section whenever required with the help of available dropdown.

Additional info:

Reproduced in both chrome[103.0.5060.114 (Official Build) (64-bit)] and firefox[91.11.0esr (64-bit)] browsers
Attached screen capture for the same.ScreenRecorder_2022-08-16_26457662-aea5-4a00-aeb4-0fbddf8f16f0.mp4

https://github.com/openshift/console/pull/13092

Bug OCPBUGS-19635: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/1942

Bug MGMT-15950: Fix DNS wilcard domain validation

View the Description View the linked PRs

Description of the problem:
Fix DNS wilcard domain validation.
DNS wildcard domain starts with validateNoWildcardDNS. The domain may have an optional trailing dot.
Currently the assumption is that the trailing dot is mandatory for the domain name.

How reproducible:

Steps to reproduce:

Actual results:

Expected results:

https://github.com/openshift/assisted-service/pull/5544

Bug OCPBUGS-22639: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/185

Bug OCPBUGS-11179: Network operator should be compliant with CIS benchmark rule

View the Description View the linked PRs

Description of problem:

Network operator is not compliant with CIS benchmark rule "Ensure Usage of Unique Service Accounts" [1] as part of "ocp4-cis" profile used in compliance operator [2]. Observed that network operator is using the default service account where default SA comes into play if there is no other service account specified. OpenShift core operators should be compliant with the CIS benchmark, i.e. the operators should run with their own serviceaccount rather than using the "default" one.

Raised similar bug for machine-config operator.

[1] https://static.open-scap.org/ssg-guides/ssg-ocp4-guide-cis.html#xccdf_org.ssgproject.content_group_accounts [2] https://docs.openshift.com/container-platform/4.11/security/compliance_operator/compliance-operator-supported-profiles.html

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Network operator using default SA

Expected results:

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2084

Bug OCPBUGS-19115: Update 4.15 ose-kubevirt-csi-driver-rhel8 image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kubevirt-csi-driver/pull/23

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kubevirt-csi-driver/pull/23

Bug OCPBUGS-24079: Update 4.15 openshift-state-metrics-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/openshift-state-metrics/pull/111

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/openshift-state-metrics/pull/111

Task MGMT-15762: In a full cluster (3 masters, 3 workers), ODF validation fails if masters have a small disk

View the Description View the linked PRs

Assisted environment: SaaS (console.redhat.com)
Interface: REST API **
OCP version:
Configuration:
3 masters, 3 workers
3 masters having a small extra disk (2GB) for etcd
3 workers having an extra disk 100GB

Validations failing checking the small disk of the masters for ODF, increasing the disk for etcd, solves the issue.The validation code: https://github.com/openshift/assisted-service/blob/7e715004c9a4c77e056bd91fe698f7f68232418f/internal/operators/odf/validations.go#L162The code should check only the workers when is not a compact clusters

https://github.com/openshift/assisted-service/pull/5529

Bug OCPBUGS-21741: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-baremetal-operator/pull/368

Bug OCPBUGS-24149: Update 4.15 cluster-monitoring-operator-container image to be consistent with ART

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-monitoring-operator/pull/2170

Bug OCPBUGS-19228: Update 4.15 ose-machine-api-provider-gcp image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-gcp/pull/58

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-gcp/pull/58

Bug OCPBUGS-21789: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/image-customization-controller/pull/103

Bug OCPBUGS-22569: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/173

Bug OCPBUGS-24323: MSTeams receiver with empty title/text triggers prometheus operator panic

View the Description View the linked PRs

Pre-requisites:

UWM enabled with AlertmanagerConfig support.

The following AlertmanagerConfig object will trigger a panic of the UWM prometheus operator:



apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
  name: alertmanager-config
  labels:
    resource: prometheus
spec:
  route:
    groupBy: ["..."]
    groupWait: 1m
    groupInterval: 1m
    repeatInterval: 12h
    receiver: "default_channel"
    routes:
      - matchers:
        - matchType: =
          name: severity
          value: warning
        receiver: teams
receivers:
    - name: "default_channel"
    - name: teams
      msteamsConfigs:
        - webhookUrl:
            name: alertmanager-teams
            key: webhook

See https://github.com/prometheus-operator/prometheus-operator/issues/6082

Bug OCPBUGS-19280: Update 4.15 ose-must-gather image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/must-gather/pull/381

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/must-gather/pull/381

Bug OCPBUGS-24678: ODF Dynamic plugin should not expose Server header

View the Description View the linked PRs

This is a clone of issue OCPBUGS-24186. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13427

Bug OCPBUGS-25921: [OVN][IPSEC] ovn-ipsec-host pods got deleted when there is a NotReady node

View the Description View the linked PRs

This is a clone of issue OCPBUGS-25337. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

Is it an

internal CI failure
customer issue / SD
internal RedHat testing failure

If it is an internal RedHat testing failure:

Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

If it is a CI failure:

Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
If it's a connectivity issue,
What is the srcNode, srcIP and srcNamespace and srcPodName?
What is the dstNode, dstIP and dstNamespace and dstPodName?
What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

If it is a customer / SD issue:

Provide enough information in the bug description that Engineering doesn’t need to read the entire case history.
Don’t presume that Engineering has access to Salesforce.
Please provide must-gather and sos-report with an exact link to the comment in the support case with the attachment. The format should be: https://access.redhat.com/support/cases/#/case/<case number>/discussion?attachmentId=<attachment id>
Describe what each attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
- If the issue is in a customer namespace then provide a namespace inspect.
- If it is a connectivity issue:
  - What is the srcNode, srcNamespace, srcPodName and srcPodIP?
  - What is the dstNode, dstNamespace, dstPodName and dstPodIP?
  - What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
  - Please provide the UTC timestamp networking outage window from must-gather
  - Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
- If it is not a connectivity issue:
  - Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.

For OCPBUGS in which the issue has been identified, label with “sbr-triaged”
For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, labels with “sbr-untriaged”
Note: bugs that do not meet these minimum standards will be closed with label “SDN-Jira-template”

https://github.com/openshift/cluster-network-operator/pull/2181

Bug OCPBUGS-21641: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/251

Bug OCPBUGS-21822: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3103

Bug OCPBUGS-23347: VSphereConnectionForm link uncorrect resources

View the Description View the linked PRs

Description of problem:

Looking at the vSphere connection configuration via UI we can see that the value for VCenter cluster is populated with the "networks" value instead of the "computeCluster" one

Additional info:

- https://github.com/openshift/console/blob/fdcd7738612cd5685c100b15d348134c96b2fa39[...]ackages/vsphere-plugin/src/components/VSphereConnectionForm.tsx
- https://github.com/openshift/console/blob/fdcd7738612cd5685c100b15d348134c96b2fa39/frontend/packages/vsphere-plugin/src/hooks/use-connection-form.ts#L69

From the form query it seems it is linked to the Network:
======================================
vCenterCluster = domain?.topology?.networks?.[0] || '';
======================================

Our understanding it that it should pickup the cluster name:
======================================
topology.computeCluster
======================================

https://github.com/openshift/console/pull/13209

Bug OCPBUGS-24638: Tuned Profiles going degraded due to the extra net.core.rps_default_mask configuration in openshift-node-performance-xxx-profile

View the Description View the linked PRs

Description of problem:
Issue - Profiles are degraded [1]even after applied due to below [2]error:

[1]

$oc get profile -A
NAMESPACE                                NAME                                          TUNED                APPLIED   DEGRADED   AGE
openshift-cluster-node-tuning-operator   master0    rdpmc-patch-master   True      True       5d
openshift-cluster-node-tuning-operator   master1    rdpmc-patch-master   True      True       5d
openshift-cluster-node-tuning-operator   master2    rdpmc-patch-master   True      True       5d
openshift-cluster-node-tuning-operator   worker0    rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker1    rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker10   rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker11   rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker12   rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker13   rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker14   rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker15   rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker2    rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker3    rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker4  rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker5    rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker6    rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker7    rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker8   rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker9   rdpmc-patch-worker   True      True       5d

[2]

  lastTransitionTime: "2023-12-05T22:43:12Z"
    message: TuneD daemon issued one or more sysctl override message(s) during profile
      application. Use reapply_sysctl=true or remove conflicting sysctl net.core.rps_default_mask
    reason: TunedSysctlOverride
    status: "True"

If we see in rdpmc-patch-master tuned:

NAMESPACE                                NAME                                          TUNED                APPLIED   DEGRADED   AGE
openshift-cluster-node-tuning-operator   master0    rdpmc-patch-master   True      True       5d
openshift-cluster-node-tuning-operator   master1    rdpmc-patch-master   True      True       5d
openshift-cluster-node-tuning-operator   master2    rdpmc-patch-master   True      True       5d

We are configuring below in rdpmc-patch-master tuned:

$ oc get tuned rdpmc-patch-master -n openshift-cluster-node-tuning-operator -oyaml |less
spec:
  profile:
  - data: |
      [main]
      include=performance-patch-master
      [sysfs]
      /sys/devices/cpu/rdpmc = 2
    name: rdpmc-patch-master
  recommend:

Below in Performance-patch-master which is included in above tuned:

spec:
  profile:
  - data: |
      [main]
      summary=Custom tuned profile to adjust performance
      include=openshift-node-performance-master-profile
      [bootloader]
      cmdline_removeKernelArgs=-nohz_full=${isolated_cores}

Below(which is coming in error) is in openshift-node-performance-master-profile included in above tuned:

net.core.rps_default_mask=${not_isolated_cpumask}

RHEL BUg has been raised for the same https://issues.redhat.com/browse/RHEL-18972

    Version-Release number of selected component (if applicable):{code:none}
4.14

https://github.com/openshift/cluster-node-tuning-operator/pull/869

Bug OCPBUGS-21836: When accessing API URL, jwks_uri endpoint returned is not correct.

View the Description View the linked PRs

Description of problem:

When accessing the URL https://api.test.lab.domain.com:6443/.well-known/openid-configuration
an jwks_uri endpoint containing an api-int URL is returned.
We expect that this endpoint would be on api instead of api-int.

Version-Release number of selected component (if applicable):

4.11

How reproducible:

100%

Steps to Reproduce:

1. From web browser access https://api.test.lab.domain.com:6443/.well-known/openid-configuration
2. From CLI try curl -kvv https://api.test.lab.domain.com:6443/.well-known/openid-configuration
3. The output is as below. The jwks_uri returned is pointing to api-int but I think it should be api
~~~~~
{"issuer":"https://kubernetes.default.svc","jwks_uri":"https://api-int.test.lab.domain.com:6443/openid/v1/jwks","response_types_supported":["id_token"],"subject_types_supported":["public"],"id_token_signing_alg_values_supported":["RS256"]} 
~~~~~

Actual results:

"jwks_uri":"https://api-int.test.lab.domain.com:6443/openid/v1/jwks

Expected results:

"jwks_uri":"https://api.test.lab.domain.com:6443/openid/v1/jwks

Additional info:

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1578

Bug OCPBUGS-24743: Remove CRI-O-update-triggered image wipe

View the Description View the linked PRs

Description of problem:

Since many 4.y ago, before 4.11 and all the minor versions that are still supported, CRI-O has wiped images when it comes up after a node reboot and notices it has a new (minor?) version. This causes redundant pulls, as seen in this 4.11-to-4.12 update run:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-azure-sdn-upgrade/1732741139229839360/artifacts/e2e-azure-sdn-upgrade/gather-extra/artifacts/nodes/ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4/journal | zgrep 'Starting update from rendered-\|crio-wipe\|Pulled image: registry.ci.openshift.org/ocp/4.12-2023-12-07-060628@sha256:3c3e67faf4b6e9e95bebb0462bd61c964170893cb991b5c4de47340a2f295dc2'
Dec 07 13:05:42.474144 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 systemd[1]: crio-wipe.service: Succeeded.
Dec 07 13:05:42.481470 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 systemd[1]: crio-wipe.service: Consumed 191ms CPU time
Dec 07 13:59:51.000686 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 crio[1498]: time="2023-12-07 13:59:51.000591203Z" level=info msg="Pulled image: registry.ci.openshift.org/ocp/4.12-2023-12-07-060628@sha256:3c3e67faf4b6e9e95bebb0462bd61c964170893cb991b5c4de47340a2f295dc2" id=a62bc972-67d7-401a-9640-884430bd16f1 name=/runtime.v1.ImageService/PullImage
Dec 07 14:00:55.745095 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 root[101294]: machine-config-daemon[99469]: Starting update from rendered-worker-ca36a33a83d49b43ed000fd422e09838 to rendered-worker-c0b3b4eadfe6cdfb595b97fa293a9204: &{osUpdate:true kargs:false fips:false passwd:false files:true units:true kernelType:false extensions:false}
Dec 07 14:05:33.274241 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 systemd[1]: crio-wipe.service: Succeeded.
Dec 07 14:05:33.289605 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 systemd[1]: crio-wipe.service: Consumed 216ms CPU time
Dec 07 14:14:50.277011 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 crio[1573]: time="2023-12-07 14:14:50.276961087Z" level=info msg="Pulled image: registry.ci.openshift.org/ocp/4.12-2023-12-07-060628@sha256:3c3e67faf4b6e9e95bebb0462bd61c964170893cb991b5c4de47340a2f295dc2" id=1a092fbd-7ffa-475a-b0b7-0ab115dbe173 name=/runtime.v1.ImageService/PullImage

The redundant pulls cost network and disk traffic, and avoiding them should make those update-initiated reboots quicker and cheaper. The lack of update-initiated wipes is not expected to cost much, because the Kubelet's old-image garbage collection should be along to clear out any no-longer-used images if disk space gets tight.

Version-Release number of selected component (if applicable):

At least 4.11. Possibly older 4.y; I haven't checked.

How reproducible:

Every time.

Steps to Reproduce:

1. Install a cluster.
2. Update to a release image with a different CRI-O (minor?) version.
3. Check logs on the nodes.

Actual results:

crio-wipe entries in the logs, with reports of target-release images being pulled before and after those wipes, as I quoted in the Description.

Expected results:

Target-release images pulled before the reboot, and found in the local cache if that image is needed again post-reboot.

https://github.com/openshift/machine-config-operator/pull/4068

Bug OCPBUGS-20063: Regenerating the machine config operator certificates can panic on vSphere

View the Description View the linked PRs

Description of problem:

An infra object in some vsphere deployments can look like this:

~]$ oc get infrastructure cluster -o json | jq .status
{
  "apiServerInternalURI": "xxx",
  "apiServerURL": "xxx",
  "controlPlaneTopology": "HighlyAvailable",
  "etcdDiscoveryDomain": "",
  "infrastructureName": "xxx",
  "infrastructureTopology": "HighlyAvailable",
  "platform": "VSphere",
  "platformStatus": {
    "type": "VSphere" 
  }
}

Which if we attempt to run the regenerate MCO command in https://access.redhat.com/articles/regenerating_cluster_certificates will cause a panic

Version-Release number of selected component (if applicable):

4.10.65
4.11.47
4.12.29
4.13.8
4.14.0
4.15

How reproducible:

100%

Steps to Reproduce:

1. Run procedure on cluster with above infra
2.
3.

Actual results:

panic

Expected results:

no panic

Additional info:

https://github.com/openshift/oc/pull/1555

Story METAL-730: ironic-image sync 2023-10

View the Description View the linked PRs

this should happen after we add the ipv6 CI jobs

https://github.com/openshift/ironic-image/pull/407

Bug OCPBUGS-18893: pods assigned with Multus whereabouts IP get stuck in ContainerCreating state after OCP upgrading

View the Description View the linked PRs

Description of problem:

pods assigned with Multus whereabouts IP get stuck in ContainerCreating state after OCP upgrading from 4.12.15 to 4.12.22. Not sure if upgrading cause the issue or node rebooting directly cause the issue.

The error message is:
(combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox mypod-0-0-1-0_testproject_8c8500e1-1643-4716-8fd7-e032292c62ab_0(2baa045a1b19291769ed56bab288b60802179ff3138ffe0d16a14e78f9cb5e4f): error adding pod testproject_mypod-0-0-1-0 to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [testproject/mypod-0-0-1-0/8c8500e1-1643-4716-8fd7-e032292c62ab:testproject-net-svc-kernel-bond]: error adding container to network "testproject-net-svc-kernel-bond": error at storage engine: k8s get error: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline

Version-Release number of selected component (if applicable):

How reproducible:

Not sure if it is reproducible

Steps to Reproduce:

1.
2.
3.

Actual results:

Pods stuck in ContainerCreating state

Expected results:

Pods creates normally

Additional info:

Customer responded deleting statefulset and recreated it didn't work.
The pods can be created normally after deleting corresponding ippools.whereabouts.cni.cncf.io manually
$ oc delete ippools.whereabouts.cni.cncf.io 172.21.24.0-22 -n openshift-multus

Bug OCPBUGS-24035: On an SNO the new CA certificate is not loaded after updating user-ca-bundle configmap

View the Description View the linked PRs

Description of problem:

     
  On an SNO a new CA certificate is not loaded after updating user-ca-bundle
 configmap and as a result the cluster cannot pull images from a 
registry with a certificate signed by the new CA.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Update ca bundle.crt replace with a new certificate if applicable )      in `user-ca-bundle` configmap under openshift-config namespace : 
  * On the node ensure that /etc/pki/ca-trust/source/anchors/openshift-config-user-ca-bundle.crt was updated with the new certificate 
     2. Create a pod which uses an image from a registry that has its certificate signed by the new CA cert provided in ca-bundle.crt 
     3.

Actual results:

    Pod fails to pull image
 *** Failed to pull image "registry.ztp-hub-00.mobius.lab.eng.rdu2.redhat.com:5000/centos/centos:8": rpc error: {  code  = Unknown desc = pinging container registry registry.ztp-hub-00.mobius.lab.eng.rdu2.redhat.com : 5000: Get "https://registry.ztp-hub-00.mobius.lab.eng.rdu2.redhat.com:5000/v2/": tls: failed to vierify certificate: x509: certificate signed by unknown authority 
  * On the node try to reach the registry via curl [https://registry.ztp-hub-00.mobius.lab.eng.rdu2.redhat.com:5000|https://registry.ztp-hub-00.mobius.lab.eng.rdu2.redhat.com:5000/] 
** certificate validation fails: curl [https://registry.ztp-hub-00.mobius.lab.eng.rdu2.redhat.com:5000|https://registry.ztp-hub-00.mobius.lab.eng.rdu2.redhat.com:5000/] 
 curl: (60) SSL certificate problem: self-signed certificate 
 More details here: [https://curl.se/docs/sslcerts.html] 

 To be able to create a pod I had to 
  ** Run `sudo update-ca-trust`. After that curl [https//registry.ztp-hub-00.mobius.lab.eng.rdu2.redhat.com:5000|https://registry.ztp-hub-00.mobius.lab.eng.rdu2.redhat.com:5000/]
 worked without issues but the pod creation still fails due to tls: 
failed to verify certificate: x509: certificate signed by unknown 
authority error 
  ** Run `sudo systemctl restart crio`. After that the pod creation succeeded and could pull the image

Expected results:

Additional info:

Attaching must gather

https://github.com/openshift/machine-config-operator/pull/4050

Bug MGMT-15980: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-service/pull/5569

Bug OCPBUGS-22947: oc-mirror panic when use v2

View the Description View the linked PRs

Description of problem:

oc-mirror will hit panic when use v2 and mirror from disk to registry

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1. Create  imageset that we are using:
cat config.yaml 
apiVersion: mirror.openshift.io/v1alpha2
kind: ImageSetConfiguration
mirror:
  platform:
    channels:
      - name: stable-4.13
        minVersion: 4.13.13
        maxVersion: 4.13.13
    graph: true
2. Mirror to disk by command :
`oc-mirror --config config.yaml file://out  --v2`
3. Mirror from disk to registry by command:
`oc-mirror --config config.yaml  --from out/working-dir/ docker://ec2-18-217-139-237.us-east-2.compute.amazonaws.com:5000/ocpv2  --v2`

Actual results:
oc-mirror --from out/working-dir/ docker://ec2-18-217-139-237.us-east-2.compute.amazonaws.com:5000/ocpv2 --v2
--v2 flag identified, flow redirected to the oc-mirror v2 version. PLEASE DO NOT USE that. V2 is still under development and it is not ready to be used.
2023/11/06 03:10:19 [ERROR] : use the --config flag it is mandatory
[root@preserve-fedora36 1106]# oc-mirror --config config.yaml --from out/working-dir/ docker://ec2-18-217-139-237.us-east-2.compute.amazonaws.com:5000/ocpv2 --v2
--v2 flag identified, flow redirected to the oc-mirror v2 version. PLEASE DO NOT USE that. V2 is still under development and it is not ready to be used.
panic: runtime error: index out of range [1] with length 1

goroutine 1 [running]:
github.com/openshift/oc-mirror/v2/pkg/cli.(*ExecutorSchema).Complete(0xc000c28a80, {0xc00012cd20, 0x1, 0x0?})
/go/src/github.com/openshift/oc-mirror/vendor/github.com/openshift/oc-mirror/v2/pkg/cli/executor.go:330 +0x1a18
github.com/openshift/oc-mirror/v2/pkg/cli.NewMirrorCmd.func1(0xc000005500?, {0xc00012cd20, 0x1, 0x6})
/go/src/github.com/openshift/oc-mirror/vendor/github.com/openshift/oc-mirror/v2/pkg/cli/executor.go:137 +0xfd
github.com/spf13/cobra.(*Command).execute(0xc000005500, {0xc000052080, 0x6, 0x6})
/go/src/github.com/openshift/oc-mirror/vendor/github.com/spf13/cobra/command.go:944 +0x847
github.com/spf13/cobra.(*Command).ExecuteC(0xc000005500)
/go/src/github.com/openshift/oc-mirror/vendor/github.com/spf13/cobra/command.go:1068 +0x3bd
github.com/spf13/cobra.(*Command).Execute(0x0?)
/go/src/github.com/openshift/oc-mirror/vendor/github.com/spf13/cobra/command.go:992 +0x19
main.main()
/go/src/github.com/openshift/oc-mirror/cmd/oc-mirror/main.go:10 +0x1e

Expected results:

No panic

Additional info:

https://github.com/openshift/oc-mirror/pull/725

Bug OCPBUGS-19018: sdn container failing to start on okd-scos

View the Description View the linked PRs

using metal-ipi on 4.14 the cluster is failing to come up,

the network cluster-operator is failing to start, the sdn pod shows the error

bash: RHEL_VERSION: unbound variable

https://github.com/openshift/cluster-network-operator/pull/2003

Bug OCPBUGS-22847: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-19174: Update 4.15 prometheus-config-reloader image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prometheus-operator/pull/243

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/prometheus-operator/pull/243

Bug OCPBUGS-23140: install cannot be go on if the apiVIP and ingressVIP are same ip when using external LB

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7803

Bug OCPBUGS-19705: Do not use port 9106 for ovnkube-control-plane metrics

View the Description View the linked PRs

In order to avoid possible issues with SDN during migration from SDN to OVNK, do not use port 9106 for ovnkube-control-plane metrics, since it's already used by SDN. Use a port that is not used by SDN, such as 9108.

https://github.com/openshift/cluster-network-operator/pull/2031

Bug OCPBUGS-15583: MachineConfig rollout after Control-Plane Node(s) CPU and Memory update because of nodeStatusUpdateFrequency being updated

View the Description View the linked PRs

Description of problem:

After adding additional CPU and Memory to the OpenShift Container Platform 4 - Control-Plane Node(s) it was noticed that a new MachineConfig was rolled out, causing all OpenShift Container Platform 4 - Node(s) to reboot unexpected.

Interesting enough, no new MachineConfig was rendered but actually a slightly older MachineConfig was picked and applied to all OpenShift Container Platform 4 - Node after the change on the OpenShift Container Platform 4 - Control-Plane Node(s) was performed.

The only visible change found in the MachineConfig was that nodeStatusUpdateFrequency was updated from 10s to 0s even though nodeStatusUpdateFrequency is not specified or configured in any MachineConfig or KubeletConfig.

https://issues.redhat.com/browse/OCPBUGS-6723 was found but given that the affected OpenShift Container Platform 4 - Cluster is running 4.11.35 it's difficult to understand what happen as generally this problem was/is suspected to be solved.

Version-Release number of selected component (if applicable):

OpenShift Container Platform 4.11.35

How reproducible:

Unknown

Steps to Reproduce:

1. OpenShift Container Platform 4 on AWS
2. Updating OpenShift Container Platform 4 - Control-Plane Node(s) to add more CPU and Memory 
3. Check whether a potential MachineConfig update is being applied

Actual results:

MachineConfig update is being rolled out to all OpenShift Container Platform 4 - Node(s) after adding CPU and Memoy to OpenShift Container Platform 4 - Control-Plane Node(s) as nodeStatusUpdateFrequency is being updated, which is rather unexpected or not clear why it's happening.

Expected results:

Either no new MachineConfig to rollout after such a change or else to have a newly rendered MachineConfig that is being rolled out with information of what changed and why this change was applied

Additional info:

https://github.com/openshift/machine-config-operator/pull/3890

Bug OCPBUGS-20104: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2051

Bug OCPBUGS-25238: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-file-csi-driver-operator/pull/90

Story CORS-2525: Azure: remove storage account with bootstrap destroy

View the Description View the linked PRs

User Story:

I would like for the Azure storage account to be destroyed as part of the bootstrap destroy process, so that the storage account is not persisted for the life of the cluster which incurs costs and other management effort.

Acceptance Criteria:

Description of criteria:

Goal: storage account is destroyed with other bootstrap resources

Engineering Details:

The storage account holds three different (types) of resources:
- boot diagnostic logs - we can get rid of these by using a managed storage account
- bootstrap ignition - used to create the bootstrap vm, is already destroyed as part of bootstrap destroy
- RHCOS VHD - used to create a VM image version
The main effort for this story will be to figure out how to create the gallery image (using the VHD in the storage account) before the bootstrap stage, but delete the storage account (and VHD) along with the other bootstrap resources

https://github.com/openshift/installer/pull/7642

Bug OCPBUGS-19418: OCP upgrade 4.13 to 4.14 fails with: an unknown error has occurred: MultipleErrors

View the Description View the linked PRs

Description of problem:

OCP Upgrades fail with message "Upgrade error from 4.13.X: Unable to apply 4.14.0-X: an unknown error has occurred: MultipleErrors"

Version-Release number of selected component (if applicable):

Currently 4.14.0-rc.1, but we observed the same issue with previous 4.14 nightlies too: 
4.14.0-0.nightly-2023-09-12-195514
4.14.0-0.nightly-2023-09-02-132842
4.14.0-0.nightly-2023-08-28-154013

How reproducible:

1 out of 2 upgrades

Steps to Reproduce:

1. Deploy OCP 4.13 with latest GA on a baremetal cluster with IPI and OVN-K
2. Upgrade to latest 4.14 available
3. Check cluster version status during the upgrade, at some point upgrade stops with message: "Upgrade error from 4.13.X Unable to apply 4.14.0-X: an unknown error has occurred: MultipleErrors"
4. Check OVN pods "oc get pods -n openshift-ovn-kubernetes", there are pods running 7 out 8 containers (missing ovnkube-node) constantly restarting, and pods running only 5 containers that show errors to connect to the OVN DBs.
5. Check cluster operators "oc get co" mainly dns, network, and machine-config remained in 4.13 and degraded.

Actual results:

Upgrade not completed, and OVN pods remain in a restarting loop with failures.

Expected results:

Upgrade should be completed without issues, and OVN pods should remain in a Running status without restarts.

Additional info:

We have tested this with latest GA versions of 4.13 (as today Sep 19: 4.13.13 to 4.14.0-rc1), but we have been observing this since 20 days ago, with previous versions of 4.13 and 4.14.
Our deployments have single stack IPv4 , one NIC for provisioning and one NIC for baremetal (machine network)

These are the results from our latest test from 4.13.13 to 4.14.0-rc1

$ oc get clusterversion
NAME     VERSION  AVAILABLE  PROGRESSING  SINCE  STATUS
version           True       True         2h8m   Unable to apply 4.14.0-rc.1: an unknown error has occurred: MultipleErrors

$ oc get mcp
NAME    CONFIG                                            UPDATED  UPDATING  DEGRADED  MACHINECOUNT  READYMACHINECOUNT  UPDATEDMACHINECOUNT  DEGRADEDMACHINECOUNT  AGE
master  rendered-master-ebb1da47ad5cb76c396983decb7df1ea  True     False     False     3             3                  3                    0                     3h41m
worker  rendered-worker-26ccb35941236935a570dddaa0b699db  False    True      True      3             2                  2                    1                     3h41m

$ oc get co
NAME                                      VERSION      AVAILABLE  PROGRESSING  DEGRADED  SINCE
authentication                            4.14.0-rc.1  True       False        False     2h21m
baremetal                                 4.14.0-rc.1  True       False        False     3h38m
cloud-controller-manager                  4.14.0-rc.1  True       False        False     3h41m
cloud-credential                          4.14.0-rc.1  True       False        False     2h23m
cluster-autoscaler                        4.14.0-rc.1  True       False        False     2h21m
config-operator                           4.14.0-rc.1  True       False        False     3h40m
console                                   4.14.0-rc.1  True       False        False     2h20m
control-plane-machine-set                 4.14.0-rc.1  True       False        False     3h40m
csi-snapshot-controller                   4.14.0-rc.1  True       False        False     2h21m
dns                                       4.13.13      True       True         True      2h9m
etcd                                      4.14.0-rc.1  True       False        False     2h40m
image-registry                            4.14.0-rc.1  True       False        False     2h9m
ingress                                   4.14.0-rc.1  True       True         True      1h14m
insights                                  4.14.0-rc.1  True       False        False     3h34m
kube-apiserver                            4.14.0-rc.1  True       False        False     2h35m
kube-controller-manager                   4.14.0-rc.1  True       False        False     2h30m
kube-scheduler                            4.14.0-rc.1  True       False        False     2h29m
kube-storage-version-migrator             4.14.0-rc.1  False      True         False     2h9m
machine-api                               4.14.0-rc.1  True       False        False     2h24m
machine-approver                          4.14.0-rc.1  True       False        False     3h40m
machine-config                            4.13.13      True       False        True      59m
marketplace                               4.14.0-rc.1  True       False        False     3h40m
monitoring                                4.14.0-rc.1  False      True         True      2h3m
network                                   4.13.13      True       True         True      2h4m
node-tuning                               4.14.0-rc.1  True       False        False     2h9m
openshift-apiserver                       4.14.0-rc.1  True       False        False     2h20m
openshift-controller-manager              4.14.0-rc.1  True       False        False     2h20m
openshift-samples                         4.14.0-rc.1  True       False        False     2h23m
operator-lifecycle-manager                4.14.0-rc.1  True       False        False     2h23m
operator-lifecycle-manager-catalog        4.14.0-rc.1  True       False        False     2h18m
operator-lifecycle-manager-packageserver  4.14.0-rc.1  True       False        False     2h20m
service-ca                                4.14.0-rc.1  True       False        False     2h23m
storage                                   4.14.0-rc.1  True       False        False     3h40m

Some OVN pods are running 7 out 8 containers (missing ovnkube-node) constantly restarting, and pods running only 5 containers that show errors to connect to the OVN DBs.

$ oc get pods -n openshift-ovn-kubernetes -o wide
NAME                                    READY  STATUS   RESTARTS  AGE    IP             NODE
ovnkube-control-plane-5f5c598768-czkjv  2/2    Running  0         2h16m  192.168.16.32  dciokd-master-1
ovnkube-control-plane-5f5c598768-kg69r  2/2    Running  0         2h16m  192.168.16.31  dciokd-master-0
ovnkube-control-plane-5f5c598768-prfb5  2/2    Running  0         2h16m  192.168.16.33  dciokd-master-2
ovnkube-node-9hjv9                      5/5    Running  1         3h43m  192.168.16.32  dciokd-master-1
ovnkube-node-fmswc                      7/8    Running  19        2h10m  192.168.16.36  dciokd-worker-2
ovnkube-node-pcjhp                      7/8    Running  20        2h15m  192.168.16.35  dciokd-worker-1
ovnkube-node-q7kcj                      5/5    Running  1         3h43m  192.168.16.33  dciokd-master-2
ovnkube-node-qsngm                      5/5    Running  3         3h27m  192.168.16.34  dciokd-worker-0
ovnkube-node-v2d4h                      7/8    Running  20        2h15m  192.168.16.31  dciokd-master-0

$ oc logs ovnkube-node-9hjv9 -c ovnkube-node -n openshift-ovn-kubernetes | less
...
2023-09-19T03:40:23.112699529Z E0919 03:40:23.112660    5883 ovn_db.go:511] Failed to retrieve cluster/status info for database "OVN_Northbound", stderr: 2023-09-19T03:40:23Z|00001|unixctl|WARN|failed to connect to /var/run/ovn/ovnnb_db.ctl
2023-09-19T03:40:23.112699529Z ovn-appctl: cannot connect to "/var/run/ovn/ovnnb_db.ctl" (No such file or directory)
2023-09-19T03:40:23.112699529Z , err: (OVN command '/usr/bin/ovn-appctl -t /var/run/ovn/ovnnb_db.ctl --timeout=5 cluster/status OVN_Northbound' failed: exit status 1)
2023-09-19T03:40:23.112699529Z E0919 03:40:23.112677    5883 ovn_db.go:590] OVN command '/usr/bin/ovn-appctl -t /var/run/ovn/ovnnb_db.ctl --timeout=5 cluster/status OVN_Northbound' failed: exit status 1
2023-09-19T03:40:23.114791313Z E0919 03:40:23.114777    5883 ovn_db.go:283] Failed retrieving memory/show output for "OVN_NORTHBOUND", stderr: 2023-09-19T03:40:23Z|00001|unixctl|WARN|failed to connect to /var/run/ovn/ovnnb_db.ctl
2023-09-19T03:40:23.114791313Z ovn-appctl: cannot connect to "/var/run/ovn/ovnnb_db.ctl" (No such file or directory)
2023-09-19T03:40:23.114791313Z , err: (OVN command '/usr/bin/ovn-appctl -t /var/run/ovn/ovnnb_db.ctl --timeout=5 memory/show' failed: exit status 1)
2023-09-19T03:40:23.116492808Z E0919 03:40:23.116478    5883 ovn_db.go:511] Failed to retrieve cluster/status info for database "OVN_Southbound", stderr: 2023-09-19T03:40:23Z|00001|unixctl|WARN|failed to connect to /var/run/ovn/ovnsb_db.ctl
2023-09-19T03:40:23.116492808Z ovn-appctl: cannot connect to "/var/run/ovn/ovnsb_db.ctl" (No such file or directory)
2023-09-19T03:40:23.116492808Z , err: (OVN command '/usr/bin/ovn-appctl -t /var/run/ovn/ovnsb_db.ctl --timeout=5 cluster/status OVN_Southbound' failed: exit status 1)
2023-09-19T03:40:23.116492808Z E0919 03:40:23.116488    5883 ovn_db.go:590] OVN command '/usr/bin/ovn-appctl -t /var/run/ovn/ovnsb_db.ctl --timeout=5 cluster/status OVN_Southbound' failed: exit status 1
2023-09-19T03:40:23.118468064Z E0919 03:40:23.118450    5883 ovn_db.go:283] Failed retrieving memory/show output for "OVN_SOUTHBOUND", stderr: 2023-09-19T03:40:23Z|00001|unixctl|WARN|failed to connect to /var/run/ovn/ovnsb_db.ctl
2023-09-19T03:40:23.118468064Z ovn-appctl: cannot connect to "/var/run/ovn/ovnsb_db.ctl" (No such file or directory)
2023-09-19T03:40:23.118468064Z , err: (OVN command '/usr/bin/ovn-appctl -t /var/run/ovn/ovnsb_db.ctl --timeout=5 memory/show' failed: exit status 1)
2023-09-19T03:40:25.118085671Z E0919 03:40:25.118056    5883 ovn_northd.go:128] Failed to get ovn-northd status stderr() :(failed to run the command since failed to get ovn-northd's pid: open /var/run/ovn/ovn-northd.pid: no such file or directory)

https://github.com/openshift/cluster-network-operator/pull/2018

Bug OCPBUGS-20528: the manifest type *ocischema.DeserializedImageIndex is not supported

View the Description View the linked PRs

Description of problem:

It's blocking the Prow CI test: https://github.com/openshift/release/pull/42822#issuecomment-1760704535

[cloud-user@preserve-olm-env2 jian]$ oc image extract registry.ci.openshift.org/ocp/4.15:cli  --path /usr/bin/oc:. --confirm
[cloud-user@preserve-olm-env2 jian]$ sudo chmod 777 oc
[cloud-user@preserve-olm-env2 jian]$ 
[cloud-user@preserve-olm-env2 jian]$ ./oc version 
Client Version: v4.2.0-alpha.0-2030-g0307852
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
[cloud-user@preserve-olm-env2 jian]$ ./oc image mirror --insecure=true --skip-missing=true --skip-verification=true --keep-manifest-list=true --filter-by-os='.*' quay.io/openshifttest/ociimage:multiarch localhost:5000/olmqe/ociimage3:multiarch
localhost:5000/
  olmqe/ociimage3
    error: the manifest type *ocischema.DeserializedImageIndex is not supported
    manifests:
      sha256:d58e3e003ddec723dd14f72164beaa609d24c5e5e366579e23bc8b34b9a58324 -> multiarch
  stats: shared=0 unique=0 size=0B

error: the manifest type *ocischema.DeserializedImageIndex is not supported
error: an error occurred during planning

Version-Release number of selected component (if applicable):

The master branch of https://github.com/openshift/oc : https://github.com/openshift/oc/commit/03078525c97d612c2070081d0e9f322f946360f4

[cloud-user@preserve-olm-env2 jian]$ podman inspect  registry.ci.openshift.org/ocp/4.15:cli 
[
     {
          "Id": "feac27a180964dff0a0ff0a9fcdb593fcf87a7d80177e6c79ab804fb8477f55b",
          "Digest": "sha256:8fcc83d3c72c66867c38456a217298239d99626d96012dbece5c669e3ad5952c",
          "RepoTags": [
               "registry.ci.openshift.org/ocp/4.15:cli"
          ],
          "RepoDigests": [
               "registry.ci.openshift.org/ocp/4.15@sha256:8fcc83d3c72c66867c38456a217298239d99626d96012dbece5c669e3ad5952c",
               "registry.ci.openshift.org/ocp/4.15@sha256:cf4f54e2f20af19afe3c5c0685aa95ab3296d177204b01a3d8bfddf7c3d45f49"
          ],
...
                    "summary": "Provides the latest release of the Red Hat Extended Life Base Image.",
                    "url": "https://access.redhat.com/containers/#/registry.access.redhat.com/openshift/ose-base/images/v4.15.0-202310111407.p0.g16dbf5e.assembly.stream",
                    "vcs-ref": "03078525c97d612c2070081d0e9f322f946360f4",
                    "vcs-type": "git",
                    "vcs-url": "https://github.com/openshift/oc",
                    "vendor": "Red Hat, Inc.",
                    "version": "v4.15.0"
...
                    "created": "2023-10-12T23:06:08.279786979Z",
                    "created_by": "/bin/sh -c #(nop) LABEL \"io.openshift.build.name\"=\"cli-amd64\" \"io.openshift.build.namespace\"=\"ci-op-37527gwf\" \"io.openshift.build.commit.author\"=\"\" \"io.openshift.build.commit.date\"=\"\" \"io.openshift.build.commit.id\"=\"03078525c97d612c2070081d0e9f322f946360f4\" \"io.openshift.build.commit.message\"=\"\" \"io.openshift.build.commit.ref\"=\"master\" \"io.openshift.build.name\"=\"\" \"io.openshift.build.namespace\"=\"\" \"io.openshift.build.source-context-dir\"=\"\" \"io.openshift.build.source-location\"=\"https://github.com/openshift/oc\" \"io.openshift.ci.from.base\"=\"sha256:d7a2588527405101eeb1578a0e97e465ec83b0b927b71cf689703554e81cb585\" \"vcs-ref\"=\"03078525c97d612c2070081d0e9f322f946360f4\" \"vcs-type\"=\"git\" \"vcs-url\"=\"https://github.com/openshift/oc\"",

How reproducible:

always

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

4.15.0-0.nightly-2023-10-09-101435(the `oc` commits 1bbfec243e5910a5a86df985489700c3d3137aed) works well.

[cloud-user@preserve-olm-env2 client]$ ./oc version 
Client Version: 4.15.0-0.nightly-2023-10-09-101435
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3

[cloud-user@preserve-olm-env2 client]$ ./oc image mirror --insecure=true --skip-missing=true --skip-verification=true --keep-manifest-list=true --filter-by-os='.*' quay.io/openshifttest/ociimage:multiarch localhost:5000/olmqe/ociimage2:multiarch2
localhost:5000/
  olmqe/ociimage2
...
sha256:d58e3e003ddec723dd14f72164beaa609d24c5e5e366579e23bc8b34b9a58324 localhost:5000/olmqe/ociimage2:multiarch2
info: Mirroring completed in 2.47s (72.87MB/s)

[cloud-user@preserve-olm-env2 oc]$ oc adm release info registry.ci.openshift.org/ocp/release:4.15.0-0.nightly-2023-10-09-101435 --commits |grep oc 
Pull From: registry.ci.openshift.org/ocp/release@sha256:b5d1f88597d49d0e34ed4acfe3149817d02774d4c0661cbcb0c04896d1a852c6
...
  tools                                          https://github.com/openshift/oc                                             1bbfec243e5910a5a86df985489700c3d3137aed

https://github.com/openshift/oc/pull/1575

Bug OCPBUGS-16189: Dual-Stack Hosted Cluster: IPv6 should not be the default pod/service network IPFamily

View the Description View the linked PRs

Description of problem:

When deploying a dual stack HostedCluster the user can define networks like this:


  networking:
    clusterNetwork:      
    - cidr: fd01::/48             
      hostPrefix: 64
    - cidr: 10.132.0.0/14
      hostPrefix: 23
    networkType: OVNKubernetes             
    serviceNetwork:          
    - cidr: fd02::/112
    - cidr: 172.31.0.0/16

This will led to missconfiguration on the hosted cluster where services will have its ClusterIP set to IPv6 family (pod network will still default to IPv4 no matter what the order was).

When deployin a dualstack cluster with the openshift-install binary there is a validation in place that will prevent users from configuring default IPv6 networks when deploying dual-stack clusters:

ERROR failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: invalid "install-config.yaml" file: [networking.serviceNetwork: Invalid value: "fd02::/112, 172.30.0.0/16": IPv4 addresses must be listed before IPv6 addresses, networking.clusterNetwork: Invalid value: "fd01::/48, 10.132.0.0/14": IPv4 addresses must be listed before IPv6 addresses]

ERROR failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: invalid "install-config.yaml" file: networking.clusterNetwork: Invalid value: "fd01::/48, 10.132.0.0/14": IPv4 addresses must be listed before IPv6 addresses     

HyperShift should detect this and either block the cluster creation or swap the order so the cluster gets created with default IPv4 networks.

Version-Release number of selected component (if applicable):

latest

How reproducible:

Always

Steps to Reproduce:

1. Deploy a HC with the networking settings specified and using the image with dual stack patches included quay.io/jparrill/hypershift:OCPBUGS-15331-mix-413v12

Actual results:

Cluster gets deployed with default IPv6 family for services network.

Expected results:

Cluster creation gets blocked OR cluster gets deployed with default IPv4 family for services network.

Additional info:

https://github.com/openshift/hypershift/pull/3047

Bug OCPBUGS-19429: oc-mirror failed with a ImageSetConfiguration yaml containing two EUS channels

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/oc-mirror/pull/757

Bug OCPBUGS-17724: Unable to destroy cluster when AWS Organization SCP prevents use of iam:GetUser

View the Description View the linked PRs

Environment: OCP 4.12.24
Installation Method: IPI: Manual Mode + STS using a customer provider AWS IAM Role

I am trying to deploy an OCP4 cluster on AWS for my customer. The customer does not permit creation of IAM users so I am performing a Manual Mode with STS IPI installation instead. I have been given an IAM role to assume for the OCP installation, but unfortunately the customer's AWS Organizational Service Control Policy (SCP) does not permit the use of the iam:GetUser{} permission.

(I have informed my customer that iam:GetUser is an installation requirement - it's clearly documented in our docs, and I have raised a ticket with their internal support team requesting that their SCP is amended to include iam:getUser, however I have been informed that my request is likely to be rejected).

With this limitation understood, I still attempted to install OCP4. Surprisingly, I was able to deploy an OCP (4.12) cluster without any apparent issues, however when I tried to destroy the cluster I encountered the following error from the installer (note: fields in brackets <> have been redacted):

DEBUG search for IAM roles
DEBUG iterating over a page of 74 IAM roles
DEBUG search for IAM users
DEBUG iterating over a page of 1 IAM users
INFO get tags for <ARN of the IAM user>: AccessDenied: User:<ARN of my user> is notauthorized to perform: iam:GetUser on resource: <IAMusername> with an explicit deny in a service control policy
INFO status code: 403, request id: <request ID>
DEBUG search for IAM instance profiles
INFO error while finding resources to delete error=get tags for <ARN of IAM user> AccessDenied: User:<ARN of my user> is not authorized to perform: iam:GetUser on resource: <IAM username> with an explicit deny in a service control policy status code: 403, request id: <request ID>

Similarly, the error in AWS CloudTrail logs shows the following (note: some fields in brackets have been redacted):
User: arn:aws:sts::<AWS account no>:assumed-role/<role-name>/<user name> is not authorized to perform: iam:GetUser on resource <IAM User> with an explicit deny in a service control policy

It appears that the destroy operation is failing when the installer is trying to list tags on the only IAM user in the customer's AWS account. As discussed, the SCP does not permit the use of iam:GetUser and consequently this API call on the IAM user is denied. The installer then enters an endless loop as it continuously retries the operation. We have potentially identified the iamUserSearch function within the installer code at pkg/destroy/aws/iamhelpers.go as the area where this call is failing.

See: https://github.com/openshift/installer/blob/16f19ea94ecdb056d4955f33ddacc96c57341bb2/pkg/destroy/aws/iamhelpers.go#L95

There does not appear to be a handler for "AccessDenied" API error in this function. Therefore we request that the access denied event is gracefully handled and skipped over when processing IAM users, allowing the installer to continue with the destroy operation, much in the same way that a similar access denied event is handled within the iamRoleSearch function when processing IAM roles:

See: https://github.com/openshift/installer/blob/16f19ea94ecdb056d4955f33ddacc96c57341bb2/pkg/destroy/aws/iamhelpers.go#L51

We therefore request that the following is considered and addressed:

1. Re-assess if the iam:GetUser permission is actually needed for cluster installation/cluster operations.
2. If the permission is required then the installer should provide a warning or halt the installation.
2. During a "destroy" cluster operation - the installer should gracefully handle AccessDenied errors from the API and "skip over" any IAM Users that the installer does not have permission to list tags for and then continue gracefully with the destroy operation.

https://github.com/openshift/installer/pull/7429

Bug OCPBUGS-22020: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-ingress-operator/pull/985

Bug OCPBUGS-24164: Update 4.15 ose-azure-cloud-node-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-azure/pull/97

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-azure/pull/97

Bug OCPBUGS-21597: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4000

Bug OCPBUGS-22265: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-azure/pull/90

Bug OCPBUGS-19243: Update 4.15 ose-gcp-cloud-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-gcp/pull/38

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-gcp/pull/38

Bug OCPBUGS-19241: Update 4.15 ose-machine-api-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-operator/pull/1167

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-operator/pull/1167

Bug OCPBUGS-19398: [IBMCloud] Add IPI support for new region eu-es (Madrid)

View the Description View the linked PRs

Description of problem:

IPI on IBM Cloud does not currently support the new eu-es region

Version-Release number of selected component (if applicable):

4.15

How reproducible:

100%

Steps to Reproduce:

1. Create install-config.yaml for IBM Cloud, per docs, using eu-es region
2. Create the manifests (or cluster) using IPI

Actual results:

level=error msg=failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: invalid "install-config.yaml" file: platform.ibmcloud.region: Unsupported value: "eu-es": supported values: "us-south", "us-east", "jp-tok", "jp-osa", "au-syd", "ca-tor", "eu-gb", "eu-de", "br-sao"

Expected results:

Successful IBM Cloud OCP cluster in eu-es

Additional info:

IBM Cloud has started testing a potential fix, in eu-es to confirm supported cluster types (Public, Private, BYON) all work properly in eu-es

https://github.com/openshift/installer/pull/7668

Bug OCPBUGS-22743: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/prometheus/pull/185

Bug OCPBUGS-17035: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/2057

Bug OCPBUGS-19176: Update 4.15 ose-service-ca-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/service-ca-operator/pull/221

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/service-ca-operator/pull/221

Bug OCPBUGS-20192: openshift.io/scc: restricted-readonly when setting up router sharding

View the Description View the linked PRs

When setting up router sharding with `endpointPublishingStrategy: Private` in a OCP 4.13.11 BareMetal cluster, the restricted-readonly scc is added to the router pods. Causing them to CrashLoopBackOff:

~~~
$ oc get pod -n openshift-ingress router-spinque-xxx -oyaml | grep -i scc
openshift.io/scc: restricted-readonly <<<
$ oc get pod -n openshift-ingress router-spinque-xxxj -oyaml | grep -i scc
openshift.io/scc: restricted-readonly <<<<
$ oc get pod -n openshift-ingress router-spinque-xxx -oyaml | grep -i scc
openshift.io/scc: restricted-readonly <<<<
~~~
~~~
router-spinque-xxx 0/1 CrashLoopBackOff 27 2h
router-spinque-xxx 0/1 CrashLoopBackOff 27 2h
router-spinque-xxx 0/1 CrashLoopBackOff 27 2h
~~~

Please find the must-gather as well as the sos-report from one of the nodes in the case 03624389 in supportshell

—

The following scc config can be used to reproduce this issue on any platform:

allowPrivilegeEscalation: true
allowedCapabilities: []
apiVersion: security.openshift.io/v1
defaultAddCapabilities: null
fsGroup:
  type: MustRunAs
groups:
- system:authenticated
kind: SecurityContextConstraints
metadata:
  name: bad-router
priority: 0
readOnlyRootFilesystem: true
requiredDropCapabilities:
- KILL
- MKNOD
- SETUID
- SETGID
runAsUser:
  type: MustRunAsRange
seLinuxContext:
  type: MustRunAs
supplementalGroups:
  type: RunAsAny
users: []
volumes:
- configMap
- downwardAPI
- emptyDir
- persistentVolumeClaim
- projected
- secret

Save the above yaml as bad-router-scc.yaml then apply it to your cluster:

$ oc apply -f bad-router-scc.yaml

Force the restart of router pods, such as by deleting one:

$ oc delete pod router-default-6465854689-gvjhs

The newly started pod(s) should be running but not ready, with the bad-router scc:

$ oc get pods
NAME                              READY   STATUS    RESTARTS   AGE
router-default-6465854689-7x558   0/1     Running   0          49s
$ oc get pod router-default-6465854689-7x558 -o yaml|grep scc
    openshift.io/scc: bad-router

If you wait long enough, it will restart multiple times, and eventually enter the CrashLoopBackOff state

https://github.com/openshift/cluster-ingress-operator/pull/981

Task HOSTEDCP-1284: Bump k8s.io/pod-security-admission to v0.28.3

View the Description View the linked PRs

Bump k8s.io/pod-security-admission to v0.28.3

https://github.com/openshift/hypershift/pull/3181

Bug OCPBUGS-18494: Upgrade DomainMapping CRD to API version v1beta1

View the Description View the linked PRs

Description of problem:

DomainMapping CRD is still using API version v1alpha1 but v1alpha1 will be removed from the Serverless Operator version 1.33. So, upgrade the API version to v1beta1 and it is available since Serverless operator 1.21.

Additional info:

NOTE: This should be backported to 4.11 and also check min Serverless operator version supported in 4.11

slack thread: https://redhat-internal.slack.com/archives/CJYKV1YAH/p1693809331579619

https://github.com/openshift/console/pull/13133

Bug OCPBUGS-21793: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-machine-approver/pull/204

Bug OCPBUGS-23996: Trust bundle CA configmap should have ownership annotations

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/300

Bug OCPBUGS-18304: vsphere IPI: missing guestinfo.domain in bootstrap VM

View the Description View the linked PRs

Description of problem:

https://github.com/openshift/installer/pull/6770 reverted part of https://github.com/openshift/installer/pull/5788 which has set guestinfo.domain for bootstrap machine. This breaks some OKD installations, which require that setting

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7451

Bug OCPBUGS-20268: [Azure] Unit tests have deadlock condition in termination handler

View the Description View the linked PRs

Description of problem:

Due to the way that the termination handlers unit tests are configured, it is possible in some cases for the counter of http requests to the mock handler can cause the test to deadlock and time out. This happens randomly as the ordering of the tests has an effect on when the bug occurs.

Version-Release number of selected component (if applicable):

4.13+

How reproducible:

It happens randomly when run in CI, or when the full suite is run. But if the tests are focused it will happen every time.
Focusing on "poll URL cannot be reached" will exploit the unit test.

Steps to Reproduce:

1. add `-focus "poll URL cannot be reached"` to unit test ginkgo arguments
2. run `make unit`

Actual results:

test suite hangs after this output:
"Handler Suite when running the handler when polling the termination endpoint and the poll URL cannot be reached should return an error /home/mike/dev/machine-api-provider-aws/pkg/termination/handler_test.go:197"

Expected results:

Tests pass

Additional info:

to fix this we need to isolate the test in its own context block, this patch should do the trick:

diff --git a/pkg/termination/handler_test.go b/pkg/termination/handler_test.go
index 2b98b08b..0f85feae 100644
--- a/pkg/termination/handler_test.go
+++ b/pkg/termination/handler_test.go
@@ -187,7 +187,9 @@ var _ = Describe("Handler Suite", func() {
                                        Consistently(nodeMarkedForDeletion(testNode.Name)).Should(BeFalse())
                                })
                        })
+               })
 
+               Context("when the termination endpoint is not valid", func() {
                        Context("and the poll URL cannot be reached", func() {
                                BeforeEach(func() {
                                        nonReachable := "abc#1://localhost"

https://github.com/openshift/machine-api-provider-azure/pull/77

Bug OCPBUGS-21616: Create from YAML crashes when YAML editor is empty

View the Description View the linked PRs

Description of problem:

When any object is created from YAML with empty editor window, the application crashes.

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1. Navigate to Virtualization -> VirtualMachines 
2. Open "Create VirtualMachine" menu 
3. Select "With YAML"
4. Clear the editor content
5. Click "Create" button

Actual results:

The application crashes

Expected results:

User is notified about invalid/empty editor content.

Additional info:

The same happens in 4.13

https://github.com/openshift/console/pull/13176

Bug OCPBUGS-18371: Searching for items in quick search is confusing

View the Description View the linked PRs

Description of problem:

In the quick search, if you search for word net you can see two options with the same name and description, one is for the source to image option and the other is for the sample option
but there is no way to differentiate in quick search

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Go to topology or Add page and select quick search
2. Search for net or node you will see confusing options
3.

Actual results:

Similar options with no differentiation in the quick search menu

Expected results:

Some way to differentiate different options in the quick search menu

Additional info:

https://github.com/openshift/console/pull/13381

Bug OCPBUGS-19206: Update 4.15 thanos image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/thanos/pull/117

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/thanos/pull/117

Bug OCPBUGS-3403: Console crashes when clicked on "Sort by" table header on "Resources" tab of an Operand's instance page

View the Description View the linked PRs

Description of problem:

Console crashes when clicked on "Sort by" table header on "Resources" tab of an Operand's instance page.

Version-Release number of selected component (if applicable):

4.13.0-0.ci-2022-11-07-202549

How reproducible:

100% (tested with 3 different Operands from 3 different Operators)

Steps to Reproduce:

1. Go to OperatorHub and install an Operator (e.g. Red Hat Integration - AMQ Streams)
2. After Operator is installed, create an Operand instance (e.g. Kafka)
3. Wait until Operand instance created successfully, go to instance's Details page --> Resource tab (e.g. Installed Operatorsamqstreams.v2.2.0-2Kafka details)
4. Click on any of Table Header to sort the resouece table

Actual results:

Console crashed

Expected results:

Resource table sorted accordingly.

Additional info:

I was testing this specifically with "OLM copiedCSVsDisabled" feature; however, I could still reproduce this crash after I set that feature back to `false`.  Hence, not sure if it relates to that feature.  Did cross-check with 4.12 nightly and can't reproduce this with 4.12 nightly

https://github.com/openshift/console/pull/13103

Task OSASINFRA-3297: MAPO: remove unnecessary retrieval of Network ID during Port specification

View the linked PRs

https://github.com/openshift/machine-api-provider-openstack/pull/96

Bug OCPBUGS-19961: Undiagnosed panic, pods/openshift-ovn-kubernetes_ovnkube-node-mtws2_ovnkube-controller

View the Description View the linked PRs

Description of problem:

Got undiagnosed panic:

: Undiagnosed panic detected in pod expand_less0s{  pods/openshift-ovn-kubernetes_ovnkube-node-mtws2_ovnkube-controller_previous.log.gz:E0929 20:36:20.743430    5682 runtime.go:79] Observed a panic: &runtime.TypeAssertionError{_interface:(*runtime._type)(0x1f9aaa0), concrete:(*runtime._type)(0x20da3e0), asserted:(*runtime._type)(0x22d0600), missingMethod:""} (interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.NetworkAttachmentDefinition)}

in this job:

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-gcp-ovn-upgrade/1707819503263420416

Version-Release number of selected component (if applicable):

4.15 ci payload:

https://amd64.ocp.releases.ci.openshift.org/releasestream/4.15.0-0.ci/release/4.15.0-0.ci-2023-09-29-180633
  https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/aggregated-gcp-ovn-upgrade-4.15-micro-release-openshift-release-analysis-aggregator/1707819513325555712

How reproducible:

This is the first time I noticed it on the 4.15

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/ovn-kubernetes/pull/1935

Bug OCPBUGS-25232: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-external-provisioner/pull/82

Bug OCPBUGS-9331: Manila deployed without metrics endpoints

View the Description View the linked PRs

Version: 4.11.0-0.nightly-2022-06-22-015220

$ openshift-install version
openshift-install 4.11.0-0.nightly-2022-06-22-015220
built from commit f912534f12491721e3874e2bf64f7fa8d44aa7f5
release image registry.ci.openshift.org/ocp/release@sha256:9c2e9cafaaf48464a0d27652088d8fb3b2336008a615868aadf8223202bdc082
release architecture amd64

Platform: OSP 16.1.8 with manila service

Please specify:

What happened?

In a fresh 4.11 cluster (with Kuryr, but shouldn't be related to the issue), there are not endpoints
for manila metrics:

> $ oc -n openshift-manila-csi-driver get endpoints
NAME ENDPOINTS AGE
manila-csi-driver-controller-metrics <none> 3h7m

> $ oc -n openshift-manila-csi-driver describe endpoints
Name: manila-csi-driver-controller-metrics
Namespace: openshift-manila-csi-driver
Labels: app=manila-csi-driver-controller-metrics
Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2022-06-22T10:30:06Z
Subsets:
Events: <none>

> $ oc -n openshift-manila-csi-driver get all
NAME READY STATUS RESTARTS AGE
pod/csi-nodeplugin-nfsplugin-4mqgx 1/1 Running 0 3h7m
pod/csi-nodeplugin-nfsplugin-555ns 1/1 Running 0 3h2m
pod/csi-nodeplugin-nfsplugin-bn26j 1/1 Running 0 3h7m
pod/csi-nodeplugin-nfsplugin-lfsm7 1/1 Running 0 3h1m
pod/csi-nodeplugin-nfsplugin-xwxnz 1/1 Running 0 3h1m
pod/csi-nodeplugin-nfsplugin-zqnkt 1/1 Running 0 3h7m
pod/openstack-manila-csi-controllerplugin-7fc4b4f56d-ddn25 6/6 Running 2 (158m ago) 3h7m
pod/openstack-manila-csi-controllerplugin-7fc4b4f56d-p9jss 6/6 Running 0 3h6m
pod/openstack-manila-csi-nodeplugin-6w426 2/2 Running 0 3h2m
pod/openstack-manila-csi-nodeplugin-fvsjr 2/2 Running 0 3h7m
pod/openstack-manila-csi-nodeplugin-g9x4t 2/2 Running 0 3h1m
pod/openstack-manila-csi-nodeplugin-gp76x 2/2 Running 0 3h7m
pod/openstack-manila-csi-nodeplugin-n9v9t 2/2 Running 0 3h7m
pod/openstack-manila-csi-nodeplugin-s6srv 2/2 Running 0 3h1m

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/manila-csi-driver-controller-metrics ClusterIP 172.30.118.232 <none> 443/TCP,444/TCP 3h7m

NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/csi-nodeplugin-nfsplugin 6 6 6 6 6 <none> 3h7m
daemonset.apps/openstack-manila-csi-nodeplugin 6 6 6 6 6 <none> 3h7m

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/openstack-manila-csi-controllerplugin 2/2 2 2 3h7m

NAME DESIRED CURRENT READY AGE
replicaset.apps/openstack-manila-csi-controllerplugin-5697ccfcbf 0 0 0 3h7m
replicaset.apps/openstack-manila-csi-controllerplugin-7fc4b4f56d 2 2 2 3h7m

This can lead to not being able to retrieve manila metrics.

openshift_install.log: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/j2pg/DFG-osasinfra-shiftstack_periodic_subjob-ocp_install-4.11-kuryr-ipi/15/undercloud-0/home/stack/ostest/.openshift_install.log.gz

must_gather: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/j2pg/DFG-osasinfra-shiftstack_periodic_subjob-ocp_install-4.11-kuryr-ipi/15/infrared/.workspaces/workspace_2022-06-22_09-58-34/must-gather-install.tar.gz/

cinder-csi for example is configured with such endpoints:

> $ oc -n openshift-cluster-csi-drivers get endpoints
NAME ENDPOINTS AGE
openstack-cinder-csi-driver-controller-metrics 10.196.1.100:9203,10.196.2.82:9203,10.196.1.100:9205 + 5 more... 3h15m

> $ oc -n openshift-cluster-csi-drivers describe endpoints
Name: openstack-cinder-csi-driver-controller-metrics
Namespace: openshift-cluster-csi-drivers
Labels: app=openstack-cinder-csi-driver-controller-metrics
Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2022-06-22T10:58:57Z
Subsets:
Addresses: 10.196.1.100,10.196.2.82
NotReadyAddresses: <none>
Ports:
Name Port Protocol
---- ---- --------
attacher-m 9203 TCP
snapshotter-m 9205 TCP
provisioner-m 9202 TCP
resizer-m 9204 TCP

Events: <none>

https://github.com/openshift/csi-driver-manila-operator/pull/210

Bug OCPBUGS-25362: Set the correct kubelet wrapper selinux permissions within MCO

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/4074

Bug OCPBUGS-26399: Missing support for singular VIP in ACI for BareMetal

View the Description View the linked PRs

Description of problem:

Since the singular variant of APIVIP/IngressVIP has been removed as part of https://github.com/openshift/installer/pull/7574, the appliance disk image e2e job is now failing: https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-appliance-master-e2e-compact-ipv4-static

The job fails since th appliance support only 4.14, which still requires the singular variant of the VIP properties.

Version-Release number of selected component (if applicable):

4.16

How reproducible:

Always

Steps to Reproduce:

1. Invoke appliance e2e job on master: https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-appliance-master-e2e-compact-ipv4-static

Actual results:

Job fails with the following validation error:
"the Machine Network CIDR can be defined by setting either the API or Ingress virtual IPs"
Due to missing apiVIP and ingressVIP in AgentClusterInstall.

Expected results:

AgentClusterInstall should include also the singular 'apiVIP' and 'ingressVIP', and the e2e job should successfully complete

Additional info:

https://github.com/openshift/installer/pull/7859

Story TRT-1362: Disruption Automated Test Data Update Stuck for a Month

View the Description View the linked PRs

https://github.com/openshift/origin/pull/28360

Failing unit tests.

Every row has MasterNodesUpdated null, might have something to do with it. Fix would be in ci-tools.

https://github.com/openshift/origin/pull/28409

Bug OCPBUGS-20270: The console repo readme is missing instructions for enabling monitoring locally

View the Description View the linked PRs

Description of problem:

Since moving to a dynamic plugin, the monitoring UI will not work when running locally unless some extra steps are taken. Bridge must be configured to use this plugin, which needs to be running alongside it. Our readme doesn't include this information or instructions.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always

Steps to Reproduce:

1. Read the readme

Actual results:

The readme does not include instructions for running monitoring locally

Expected results:

The readme includes instructions for running monitoring locally

https://github.com/openshift/console/pull/13226

Bug OCPBUGS-22724: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes/pull/1800

Bug OCPBUGS-20572: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-baremetal-operator/pull/375

Bug TRT-1340: openshift-test run command should offer options to disable/enable monitor tests

View the Description View the linked PRs

Some IBM jobs using openshift-test run failed due to recent monitor refactor. They request command options to disable monitor tests in openshift-test run. This is already implemented in openshift-test run-monitor.

IBM-Roks needs this, will link to slack thread

https://github.com/openshift/origin/pull/28371

Bug OCPBUGS-19232: Update 4.15 vertical-pod-autoscaler-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/vertical-pod-autoscaler-operator/pull/147

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Important: ART has recorded in their product data that bugs for
this component should be opened against Jira project "OCPBUGS" and
component "Node / Autoscaler (HPA, VPA, CMA)". This project or component does not exist. Jira
should either be updated to include this component or @release-artists should be
notified of the proper mapping in the #forum-ocp-art Slack channel.

Component name: ose-vertical-pod-autoscaler-operator-container .
Jira mapping: https://github.com/openshift-eng/ocp-build-data/blob/main/product.yml

https://github.com/openshift/kubernetes-autoscaler/pull/262

Bug OCPBUGS-19281: Update 4.15 openshift-enterprise-deployer image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/oc/pull/1544

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/oc/pull/1544

Task MON-3287: Move etcd monitoring RBAC to CEO

View the Description View the linked PRs

The ServiceMonitors and other related resources were moved in https://issues.redhat.com/browse/MON-669

We thought move RBAC make more sense as well https://github.com/openshift/cluster-monitoring-operator/pull/2039#discussion_r1262307325

Bug OCPBUGS-15220: The multus-admission-controller deployment in a hypershift cluster needs to ensure pods run in separate zones

View the Description View the linked PRs

Description of problem:

To ensure pods run in separate zone for a hypershift cluster, a PodAntiAffinity spec should be provided.

Version-Release number of selected component (if applicable):

4.12, 4.13, 4.14

How reproducible:

Always

Steps to Reproduce:

1. Create a hypershift control plane in ha mode.
2. Observe the multus admission controller pods.
3.

Actual results:

Not all pods scheduled on separate zones.

Expected results:

Pods scheduled on separate zones.

Additional info:

https://github.com/openshift/cluster-network-operator/pull/1795

Bug OCPBUGS-19101: Update 4.15 ose-aws-ebs-csi-driver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/aws-ebs-csi-driver/pull/235

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/aws-ebs-csi-driver/pull/235

Bug OCPBUGS-18307: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/multus-cni/pull/180

Bug OCPBUGS-24814: Update 4.16 ose-installer-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/installer/pull/7816

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/installer/pull/7816

Bug OCPBUGS-7638: Adding a 2nd default route to an OCP Cluster

View the Description View the linked PRs

Description of problem:

We have a customer on OCP 4.10.47, using OVN-K8S in local gateway mode requiring either updating or adding an additional default route.  The question we have is whether there is a way to do this using the interface hints such that the new default route would have a higher/better priority then the day-0 default route and on node reboot and/or cluster upgrade, this does not affect OVN (based on the interface hints, OVN can use the original default route even though it would have a lower priority..

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Task OSASINFRA-3284: Add Events permission to OpenStack's CCM cluster role

View the Description View the linked PRs

Required after https://github.com/kubernetes/cloud-provider-openstack/pull/2383

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/292

Bug OCPBUGS-16922: AdmissionWebhookMatchConditions tests are failing with Kubernetes 1.28 bump

View the Description View the linked PRs

Description of problem:

AdmissionWebhookMatchConditions are enabled by default in Kubernetes 1.28, but we are currently disabling the feature gate in openshift/api.

As a result, e2e tests are failing with Kubernetes 1.28 bump:

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_kubernetes/1646/pull-ci-openshift-kubernetes-master-e2e-aws-ovn-fips/1684354421837795328

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

AdmissionWebhookMatchConditions tests are failing

Expected results:

AdmissionWebhookMatchConditions should pass

Additional info:

Let me know once this is fixed so that we can drop the commit that skip these tests.

https://github.com/openshift/kubernetes/pull/1790

Bug OCPBUGS-21738: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-kube-storage-version-migrator-operator/pull/95

Bug OCPBUGS-25808: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-gcp/pull/55

Bug OCPBUGS-24695: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oc/pull/1620

Bug OCPBUGS-14994: Ingress operator attempts spurious deletes of the client CA configmap when deleting an IngressController that has a client TLS configured

View the Description View the linked PRs

Description of problem

When the ingress operator's clientca-configmap controller reconciles an IngressController, this controller attempts to add a finalizer to the IngressController if that finalizer is absent. This controller erroneously attempts to add the missing finalizer even if the IngressController is marked for deletion, which results in an error. This error causes the controller to retry the deletion and log the error multiple times.

Version-Release number of selected component (if applicable)

I observed this in CI for OCP 4.14 and was able to reproduce it on 4.11.37, and it probably affects earlier versions as well. The problematic code was added in https://github.com/openshift/cluster-ingress-operator/pull/450/commits/0f36470250c3089769867ebd72e25c413a29cda2 in OCP 4.9 to implement ~~NE-323~~.

How reproducible

Easily.

Steps to Reproduce

1. Create a configmap in the "openshift-config" namespace (to reproduce this issue, it is not necessary that the configmap have a valid TLS certificate and key):

oc -n openshift-config create configmap client-ca-cert

2. Create an IngressController that specifies spec.clientTLS.clientCA.name to point to the configmap from the previous step:

oc create -f - <<EOF
apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: test-client-ca-configmap
  namespace: openshift-ingress-operator
spec:
  domain: example.xyz
  endpointPublishingStrategy:
    type: Private
  clientTLS:
    clientCA:
      name: client-ca-cert
    clientCertificatePolicy: Required
EOF

3. Delete the IngressController:

oc -n openshift-ingress-operator delete ingresscontrollers/test-client-ca-configmap

4. Check the ingress operator's logs:

oc -n openshift-ingress-operator logs -c ingress-operator deployments/ingress-operator

Actual results

The ingress operator logs several attempts to add the finalizer to the IngressController after it has been marked for deletion:

2023-06-15T02:17:12.419Z        ERROR   operator.init   controller/controller.go:273    Reconciler error        {"controller": "clientca_configmap_controller", "object": {"name":"test-client-ca-configmap","namespace":"openshift-ingress-operator"}, "namespace": "openshift-ingress-operator", "name": "test-client-ca-configmap", "reconcileID": "2274f55e-e5bd-4fdb-973e-821a44cf2ebf", "error": "failed to add client-ca-configmap finalizer: IngressController.operator.openshift.io \"test-client-ca-configmap\" is invalid: metadata.finalizers: Forbidden: no new finalizers can be added if the object is being deleted, found new finalizers []string{\"ingresscontroller.operator.openshift.io/finalizer-clientca-configmap\"}"}

The deletion does succeed, errors notwithstanding.

Expected results

The ingress operator should succeed in deleting the IngressController without attempting to re-add the finalizer to the IngressController after it has been marked for deletion.

https://github.com/openshift/cluster-ingress-operator/pull/948

Bug OCPBUGS-19093: Skip agent-tui on OCI

View the Description View the linked PRs

Although tty1 always exists, OCI only has a serial console available (assuming it is enabled - see ~~OCPBUGS-19092~~), so the user doesn't see anything on the console while agent-tui is running (and in fact the systemd progress output is suspended for the duration).

Network configuration of any kind is rarely needed in the cloud, anyway. So on OCI specifically we mostly are slowing boot down by 20s for no real reason. We should disable agent-tui in this case - either by disabling the service or simply not adding the binary to the ISO image.

Bug OCPBUGS-19376: [gcp] IPI installation using the service account attached to a GCP VM always fail with error "unable to parse credentials"

View the Description View the linked PRs

Description of problem:

IPI installation using the service account attached to a GCP VM always fail with error "unable to parse credentials"

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-09-15-233408

How reproducible:

Always

Steps to Reproduce:

1. "create install-config"
2. edit install-config.yaml to insert "credentialsMode: Manual"
3. "create manifests"
4. manually create the required credentials and copy the manifests to installation-dir/manifests directory
5. launch the bastion host along with binding to the pre-configured service account ipi-on-bastion-sa@openshift-qe.iam.gserviceaccount.com and scopes being "cloud-platform"
6. copy the installation-dir and openshift-install to the bastion host
7. try "create cluster" on the bastion host

Actual results:

The installation failed on "Creating infrastructure resources"

Expected results:

The installation should succeed.

Additional info:

(1) FYI the 4.12 epic: https://issues.redhat.com/browse/CORS-2260

(2) 4.12.34 doesn't have the issue (Flexy-install/234112/). 

(3) 4.13.13 doesn’t have the issue (Flexy-install/234126/).

(4) The 4.14 errors (Flexy-install/234113/):
09-19 16:13:44.919  level=info msg=Consuming Master Ignition Config from target directory
09-19 16:13:44.919  level=info msg=Consuming Bootstrap Ignition Config from target directory
09-19 16:13:44.919  level=info msg=Consuming Worker Ignition Config from target directory
09-19 16:13:44.919  level=info msg=Credentials loaded from gcloud CLI defaults
09-19 16:13:49.071  level=info msg=Creating infrastructure resources...
09-19 16:13:50.950  level=error
09-19 16:13:50.950  level=error msg=Error: unable to parse credentials
09-19 16:13:50.950  level=error
09-19 16:13:50.950  level=error msg=  with provider["openshift/local/google"],
09-19 16:13:50.950  level=error msg=  on main.tf line 10, in provider "google":
09-19 16:13:50.950  level=error msg=  10: provider "google" {
09-19 16:13:50.950  level=error
09-19 16:13:50.950  level=error msg=unexpected end of JSON input
09-19 16:13:50.950  level=error msg=failed to fetch Cluster: failed to generate asset "Cluster": failure applying terraform for "cluster" stage: failed to create cluster: failed to apply Terraform: exit status 1
09-19 16:13:50.950  level=error
09-19 16:13:50.950  level=error msg=Error: unable to parse credentials
09-19 16:13:50.950  level=error
09-19 16:13:50.950  level=error msg=  with provider["openshift/local/google"],
09-19 16:13:50.950  level=error msg=  on main.tf line 10, in provider "google":
09-19 16:13:50.950  level=error msg=  10: provider "google" {
09-19 16:13:50.950  level=error
09-19 16:13:50.950  level=error msg=unexpected end of JSON input
09-19 16:13:50.950  level=error

https://github.com/openshift/installer/pull/7519

Bug OCPBUGS-21972: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console-operator/pull/802

Bug OCPBUGS-23378: PF5 bubble component with wrong layout in Create NetworkPolicy page

View the Description View the linked PRs

Description of problem:

The bubble box with wrong layout

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-11-16-110328

How reproducible:

Always

Steps to Reproduce:

1. Make sure there is no pod under your using project
2. navigate to Networking -> NetworkPolicies -> Create NetworkPolicy page, click the 'affected pods' in Pod selector section
3. Check the layout in the bubble component

Actual results:

the layout is in correct (shared file:https://drive.google.com/file/d/1I8e2ZkiFO2Gu4nSt9kJ6JmRG3LdvkE-u/view?usp=drive_link )

Expected results:

layout should correct

Additional info:

https://github.com/openshift/console/pull/13390

Bug MGMT-13461: Day-2 hosts cannot join imported Tang clusters

View the Description View the linked PRs

Description of the problem:

Agents don't run the StepTypeTangConnectivityCheck step on day-2 hosts in imported clusters

How reproducible:

Unknown

Steps to reproduce:

1. Install day-1 cluster with Tang

2. Attempt to add day-2 host

Actual results:

disk-encryption-requirements-satisfied stuck pending

Expected results:

disk-encryption-requirements-satisfied should be eventually either failed or success

https://github.com/openshift/assisted-service/pull/5700

Bug OCPBUGS-18137: [GCP 4.14] [Azure/AWS <=4.13] Pod didn't trigger arm64 machineset scale out from 0 when a required node selector term on non-amd64 nodes is set

View the Description View the linked PRs

Description of problem:

When a workload includes a node selector term on the label kubernetes.io/arch and the allowed values do not include amd64, the auto scaler does not trigger the scale out of a valid, non-amd64, machine set if its current replicas are 0 and (for 4.14+) no architecture capacity annotation is set (ref ~~MIXEDARCH-129~~).

The issue is due to https://github.com/openshift/kubernetes-autoscaler/blob/f0ceeacfca57014d07f53211a034641d52d85cfd/cluster-autoscaler/cloudprovider/utils.go#L33

This bug should be considered at first on clusters having the same architecture for the control plane and the data plane.

In the case of multi-arch compute clusters, there is probably no alternative than letting the capacity annotation to be properly set in the machine set either manually or by the cloud provider actuator, as already discussed in the ~~MIXEDARCH-129~~ works, otherwise relying to the control plane architecture.

Version-Release number of selected component (if applicable):

- ARM64 IPI on GCP 4.14
- ARM64 IPI on Aws and Azure <=4.13
- In general, non-amd64 single-arch clusters supporting autoscale from 0

How reproducible:

Always

Steps to Reproduce:

1. Create an arm64 IPI cluster on GCP
2. Set one of the machinesets to have 0 replicas: 
    oc scale -n openshift-machine-api machineset/adistefa-a1-zn8pg-worker-f
3. Deploy the default autoscaler
4. Deploy the machine autoscaler for the given machineset
5. Deploy a workload with node affinity to arm64 only nodes, large resource requests and enough number of replicas.

Actual results:

From the pod events: 

pod didn't trigger scale-up: 1 node(s) didn't match Pod's node affinity/selector

Expected results:

The cluster autoscaler scales the machineset with 0 replicas in order to provide resources for the pending pods.

Additional info:

---
apiVersion: autoscaling.openshift.io/v1
kind: ClusterAutoscaler
metadata:
  name: default
spec: {}
---
apiVersion: autoscaling.openshift.io/v1beta1
kind: MachineAutoscaler
metadata:
  name: worker-us-east-1a
  namespace: openshift-machine-api
spec:
  minReplicas: 0
  maxReplicas: 12
  scaleTargetRef:
    apiVersion: machine.openshift.io/v1beta1
    kind: MachineSet
    name: adistefa-a1-zn8pg-worker-f
---
apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: openshift-machine-api
  name: 'my-deployment'
  annotations: {}
spec:
  selector:
    matchLabels:
      app: name
  replicas: 3
  template:
    metadata:
      labels:
        app: name
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                - key: kubernetes.io/arch
                  operator: In
                  values:
                    - "arm64"
      containers:
        - name: container
          image: >-
            image-registry.openshift-image-registry.svc:5000/openshift/httpd:latest
          ports:
            - containerPort: 8080
              protocol: TCP
          env: []
          resources:
              requests:
                cpu: "2"
      imagePullSecrets: []
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
  paused: false

Bug OCPBUGS-18187: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7518

Bug OCPBUGS-21906: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/containernetworking-plugins/pull/128

Bug OCPBUGS-23516: Monitoring console plugin should avoid browser-caching failures

View the Description View the linked PRs

Description of problem:

~~MON-2967~~ and cmo#1890 moved the Observe console menu into a console plugin (in 4.15? 4.14?). Sometimes If-Modified-Since browser caching results in failures that result in a missing Observe menu, and when the user eventually finds /k8s/cluster/operator.openshift.io~v1~Console/cluster/console-plugins, render failure as:

Failed to get a valid plugin manifest from /api/plugins/monitoring-plugin/
SyntaxError: Unexpected end of JSON input

This appears to be the result of the browser's If-Modified-Since caching:

$ curl -sH Accept:application/json -H Cache-Control:max-age=0 -H 'Cookie: openshift-session-token=...; login-state=...; ...; csrf-token=...'
-H 'If-Modified-Since: Fri, 03 Nov 2023 00:47:45 GMT' -i https://console.build02.ci.openshift.org/api/plugins/monitoring-plugin/plugin-manifest.json
HTTP/1.1 200 OK
date: Tue, 21 Nov 2023 16:52:55 GMT
etag: "65444331-9a2"
last-modified: Fri, 03 Nov 2023 00:47:45 GMT
referrer-policy: strict-origin-when-cross-origin
server: nginx/1.20.1
x-content-type-options: nosniff
x-dns-prefetch-control: off
x-frame-options: DENY
x-xss-protection: 1; mode=block
content-length: 0

While a more recent If-Modified-Since returns populated JSON:

$ curl -sH Accept:application/json -H 'If-Modified-Since: Fri, 10 Nov 2023 10:47:45 GMT' -H 'Cookie: openshift-session-token=...; login-state=...; ...; csrf-token=...' https://console.build02.ci.openshift.org/api/plugins/monitoring-plugin/plugin-manifest.json | jq . | head
{
  "name": "monitoring-plugin",
  "version": "1.0.0",
  "displayName": "OpenShift console monitoring plugin",
  "description": "This plugin adds the monitoring UI to the OpenShift web console",
  "dependencies": {
    "@console/pluginAPI": "*"
  },
  "extensions": [
    {

Disabling caching on the monitoring-plugin side would avoid this issues. But fixing 304 handling in the console's proxy would likely also resolve the issue.

Version-Release number of selected component (if applicable):

Seen in 4.15.0-ec.2. Reproduced in ec.2. Failed to reproduce in ec.1. Possibly a regression from ec.1 to ec.2, although I haven't identified a regressing commit yet.

How reproducible:

Seen multiple times by multiple users in 4.15.0-ec.2 in two long-lived clusters, and also reproduced in an ec.2 Cluster Bot cluster. Likely consistently reprodible on ec.2.

Steps to Reproduce:

1. Install a cluster, e.g. with launch 4.15.0-ec.2 gcp.
2. Log into the console and use the developer tab to get an openshift-session-token value from a successful HTTPS request.
3.

$ curl -ksi -H "Cookie: openshift-session-token=${TOKEN}" "https://${HOST}/api/plugins/monitoring-plugin/plugin-manifest.json" | grep 'HTTP\|content-\|last-modified'

with your ${TOKEN} and ${HOST}, to confirm 200 responses and find the last-modified value.
4.

$ curl -ksi -H "If-Modified-Since: ${LAST_MODIFIED}" -H "Cookie: openshift-session-token=${TOKEN}" "https://${HOST}/api/plugins/monitoring-plugin/plugin-manifest.json"

with your ${TOKEN}, ${HOST}, and ${LAST_MODIFIED}.

Actual results:

Observe menu is missing, with browser-console logs like:

Failed to get a valid plugin manifest from /api/plugins/monitoring-plugin/
SyntaxError: Unexpected end of JSON input

200 responses with no content when If-Modified-Since is greater than or equal to the content's last-modified.

Expected results:

Reliably successful loading of the monitoring console plugin, with a 304 when If-Modified-Since is greater than or equal to the content's last-modified.

Possibly more obvious warnings pointing at /k8s/cluster/operator.openshift.io~v1~Console/cluster/console-plugins when plugins fail to load.

Additional info:

Using the browser's development tools to disable caching while loading the console avoids the problematic caching interaction.

https://github.com/openshift/cluster-monitoring-operator/pull/2186

Task HOSTEDCP-1300: Bump k8s.io/client-go to v0.28.3

View the Description View the linked PRs

Bump k8s.io/client-go to v0.28.3

https://github.com/openshift/hypershift/pull/3191

Bug OCPBUGS-19251: Update 4.15 ose-prometheus-adapter image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/k8s-prometheus-adapter/pull/74

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/k8s-prometheus-adapter/pull/74

Bug OCPBUGS-19510: CI issue with cluster-dns-operator TestCoreDNSDaemonSetReconciliation

View the Description View the linked PRs

Description of problem:

I noticed this in the logs at https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-dns-operator/373/pull-ci-openshift-cluster-dns-operator-master-e2e-aws-ovn-operator/1704287854600916992/build-log.txt:

=== RUN   TestCoreDNSDaemonSetReconciliation
[controller-runtime] log.SetLogger(...) was never called, logs will not be displayed:
goroutine 205 [running]:
runtime/debug.Stack()
	/usr/lib/golang/src/runtime/debug/stack.go:24 +0x65
sigs.k8s.io/controller-runtime/pkg/log.eventuallyFulfillRoot()
	/go/src/github.com/openshift/cluster-dns-operator/vendor/sigs.k8s.io/controller-runtime/pkg/log/log.go:59 +0xbd
sigs.k8s.io/controller-runtime/pkg/log.(*delegatingLogSink).WithName(0xc000061340, {0x182213b, 0x14})
	/go/src/github.com/openshift/cluster-dns-operator/vendor/sigs.k8s.io/controller-runtime/pkg/log/deleg.go:147 +0x4c
github.com/go-logr/logr.Logger.WithName({{0x1aa8468, 0xc000061340}, 0x0}, {0x182213b?, 0x0?})
	/go/src/github.com/openshift/cluster-dns-operator/vendor/github.com/go-logr/logr/logr.go:336 +0x46
sigs.k8s.io/controller-runtime/pkg/client.newClient(0xc000789200, {0x0, 0xc0001a7730, {0x1aa9d90, 0xc00011c700}, 0x0, {0x0, 0x0}, 0x0})
	/go/src/github.com/openshift/cluster-dns-operator/vendor/sigs.k8s.io/controller-runtime/pkg/client/client.go:115 +0xb4
sigs.k8s.io/controller-runtime/pkg/client.New(0xc000789200?, {0x0, 0xc0001a7730, {0x1aa9d90, 0xc00011c700}, 0x0, {0x0, 0x0}, 0x0})
	/go/src/github.com/openshift/cluster-dns-operator/vendor/sigs.k8s.io/controller-runtime/pkg/client/client.go:101 +0x85
github.com/openshift/cluster-dns-operator/pkg/operator/client.NewClient(0x0?)
	/go/src/github.com/openshift/cluster-dns-operator/pkg/operator/client/client.go:52 +0x145
github.com/openshift/cluster-dns-operator/test/e2e.getClient()
	/go/src/github.com/openshift/cluster-dns-operator/test/e2e/utils.go:451 +0x77
github.com/openshift/cluster-dns-operator/test/e2e.TestCoreDNSDaemonSetReconciliation(0xc000501520)
	/go/src/github.com/openshift/cluster-dns-operator/test/e2e/operator_test.go:330 +0x45
testing.tRunner(0xc000501520, 0x193c038)
	/usr/lib/golang/src/testing/testing.go:1576 +0x10b
created by testing.(*T).Run
	/usr/lib/golang/src/testing/testing.go:1629 +0x3ea
    operator_test.go:374: found "foo" node selector on daemonset openshift-dns/dns-default: <nil>
    operator_test.go:378: observed absence of "foo" node selector on daemonset openshift-dns/dns-default: <nil>
--- PASS: TestCoreDNSDaemonSetReconciliation (1.63s)

We need to make a minor change in https://github.com/openshift/cluster-dns-operator/blob/7d2a16c0abf80d09fdcbeef8464994b78aa0589d/test/e2e/operator_test.go#L374-L375

Version-Release number of selected component (if applicable):

4.15 and earlier

How reproducible:

Be unlucky in CI testing

Steps to Reproduce:

1.
2.
3.

Actual results:

Stack trace and prints a <nil>
 operator_test.go:374: found "foo" node selector on daemonset openshift-dns/dns-default: <nil>
    operator_test.go:378: observed absence of "foo" node selector on daemonset openshift-dns/dns-default: <nil>

Expected results:

No stack trace and no print of <nil>

Additional info:

https://github.com/openshift/cluster-dns-operator/pull/381

Bug MGMT-15559: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-service/pull/5445

Bug OCPBUGS-24101: Update 4.15 ose-nutanix-cloud-controller-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-nutanix/pull/23

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-nutanix/pull/23

Bug OCPBUGS-18858: Update 4.15 golang-github-openshift-oauth-proxy image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/oauth-proxy/pull/265

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/oauth-proxy/pull/265

Bug OCPBUGS-19260: Update 4.15 csi-snapshot-validation-webhook image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-snapshotter/pull/106

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-snapshotter/pull/106

Bug OCPBUGS-24113: Update 4.15 ose-aws-cluster-api-controllers-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-aws/pull/485

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-aws/pull/485

Bug OCPBUGS-25772: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-libvirt/pull/278

Bug OCPBUGS-9066: Installer should retry when it fails to download the RHCOS image

View the Description View the linked PRs

Description of problem:
From time to time the installation fails with something like the one below:

2022-01-03 16:33:27.936 | level=debug msg=Generating Terraform Variables...
2022-01-03 16:33:27.940 | level=info msg=Obtaining RHCOS image file from 'https://rhcos-redirector.apps.art.xq1c.p1.openshiftapps.com/art/storage/releases/rhcos-4.8/48.84.202109241901-0/x86_64/rhcos-48.84.202109241901-0-openstack.x86_64.qcow2.gz?sha256=e0a1d8a99c5869150a56b8de475ea7952ca2fa3aacad7ca48533d1176df503ab'
2022-01-03 16:33:27.943 | level=fatal msg=failed to fetch Terraform Variables: failed to generate asset "Terraform Variables": failed to get openstack Terraform variables: Get "https://rhcos-redirector.apps.art.xq1c.p1.openshiftapps.com/art/storage/releases/rhcos-4.8/48.84.202109241901-0/x86_64/rhcos-48.84.202109241901-0-openstack.x86_64.qcow2.gz": dial tcp: lookup rhcos-redirector.apps.art.xq1c.p1.openshiftapps.com on 10.46.0.31:53: read udp 172.16.40.23:38673->10.46.0.31:53: i/o timeout
2022-01-03 16:33:27.946 |

Version:
4.8.0-0.nightly-2021-12-23-010813 but we see it for other versions as well
IPI

I expect the installer to have some sort of retry mechanism.

https://github.com/openshift/installer/pull/7106

Bug OCPBUGS-20058: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7549

Bug OCPBUGS-23110: [CI-Watcher] Disable Pipelines E2E Tests

View the Description View the linked PRs

Description of problem:

Pipeline E2E tests have been disabled as the CI is failing.

The probable guess is that our clusters says that we're 4.15 now and that the operator couldn't be found because its only compatible with 4.x-4.14.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13318

Bug OCPBUGS-27025: Missing .snyk files

View the Description View the linked PRs

We're missing .snyk files and snyk scans are reporting false-positives.

https://github.com/openshift/cloud-provider-openstack/pull/262

Bug OCPBUGS-19184: Update 4.15 cluster-storage-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-storage-operator/pull/398

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-storage-operator/pull/398

Bug OCPBUGS-9340: oc adm upgrade runs default case for incorrect subcommand

View the Description View the linked PRs

Description of problem:

`oc adm upgrade` silently errors out on incorrect subcommand without doing/notifying anything
this is due to the `default` case in `run()` which catches all the incorrect subcommand and runs the default part instead.

Version-Release number of selected component (if applicable): 4.10 and current

How reproducible:
use any incorrect subcommand with `oc adm upgrade`.
example: `oc adm upgrade incorrect-subcommand`

Steps to Reproduce:
1. run `oc adm upgrade incorrect-subcommand`

Actual results:
oc prints the cluster upgrade status

Expected results:
oc should error out saying incorrect subcommand

https://github.com/openshift/oc/pull/1557

Bug MGMT-16047: InfraEnv accepting cpuArchitecture arm64 causes the converged ZTP flow to break

View the Description View the linked PRs

Description of the problem:

The InfraEnv resource will accept both arm64 and aarch64 as valid cpuArchitectures. Both result in an ISO URL with arm64 in the path. However, supplying the infraEnv with cpuArchitecture arm64 will result in the converged flow becoming stuck because of the metal3 PreprovisioningImage resource only accepts aarch64 as an architecture:

  - lastTransitionTime: "2023-10-26T14:46:14Z"
    message: PreprovisioningImage CPU architecture (aarch64) does not match InfraEnv
      CPU architecture (arm64)
    observedGeneration: 2
    reason: InfraEnvArchMismatch
    status: "False"
    type: Ready
  - lastTransitionTime: "2023-10-26T14:46:14Z"
    message: PreprovisioningImage CPU architecture (aarch64) does not match InfraEnv
      CPU architecture (arm64)
    observedGeneration: 2
    reason: InfraEnvArchMismatch
    status: "True"
    type: Error
  networkData: {}

How reproducible:

100%

Steps to reproduce:

1. Create an infraenv with cpuArchitecture: arm64

2. Create BMH resources with the converged flow enabled

Actual results:

PreprovisioningImages have InfraEnvArchMismatch because it only support aarch64 architecture

Expected results:

InfraEnv only support aarch64 cpuArchitecture or correctly translates arm64 to aarch64.

Workaround

The workaround is just to create the InfraEnv resource with cpuArchitecture: aarch64 instead of arm64

https://github.com/openshift/cluster-baremetal-operator/pull/383

Bug OCPBUGS-19247: Update 4.15 csi-node-driver-registrar image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-node-driver-registrar/pull/49

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-node-driver-registrar/pull/49

Bug OCPBUGS-23776: After PatternFly5 update: Add page > more button dropdown is too wide

View the Description View the linked PRs

Issue 50 from https://docs.google.com/spreadsheets/d/1TR3ENY-GE_LQL9F-xH6NahtRHdu6IrYGhLqDDA8y0EI/edit#gid=1035185624

Add page dropdown doesn't break anymore and overlays if the window is too small.

Screenshots:

https://github.com/openshift/console/pull/13361

Bug OCPBUGS-23980: PipelineRun logs not autoscrolling to the bottom of the page

View the Description View the linked PRs

Description:

High volume Pipelinerun/Taskrun logs are not auto scrolling to the bottom of the page.

Steps to reproduce:

1. Create pipelinerun that produces high volume log output
2. navigate to logs page

Video - https://drive.google.com/file/d/17Dc0ME6KYtkyQmW96lT8J_tMfT-dBRbb/view?usp=drive_link

https://github.com/openshift/console/pull/13377

Bug OCPBUGS-23300: Internal NLB issue (OCPBUGS-9026) causes random failures on HCP private cluster without infra nodes

View the Description View the linked PRs

Description of problem:

Actually the issue is same root cause of https://issues.redhat.com/browse/OCPBUGS-9026 but I'd like to open new one since the issue becomes very critical after ROSA using NLB as default since 4.14, HCP(HyperShift) private cluster that without infra nodes is the serious victim because it has worker nodes only and no available workaround for it now.

But if we think we could use the old bug to track the issue, then please close this one.

Version-Release number of selected component (if applicable):

4.14.1
HyperShift Private cluster

How reproducible:

100%

Steps to Reproduce:

1. create ROSA HCP(HyperShift) cluster
2. run qe-e2e-test on this cluster, or curl route from one pod inside the cluster
3.

Actual results:

1. co/console status is flapping since route is intermittently accessible 
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.1    True        False         4h56m   Error while reconciling 4.14.1: the cluster operator console is not available


2. check node and router pods running on both worker nodes
$ oc get node
NAME                          STATUS   ROLES    AGE    VERSION
ip-10-0-49-184.ec2.internal   Ready    worker   5h5m   v1.27.6+f67aeb3
ip-10-0-63-210.ec2.internal   Ready    worker   5h8m   v1.27.6+f67aeb3

$ oc -n openshift-ingress get pod -owide
NAME                              READY   STATUS    RESTARTS   AGE    IP           NODE                          NOMINATED NODE   READINESS GATES
router-default-86d569bf84-bq66f   1/1     Running   0          5h8m   10.130.0.7   ip-10-0-49-184.ec2.internal   <none>           <none>
router-default-86d569bf84-v54hp   1/1     Running   0          5h8m   10.128.0.9   ip-10-0-63-210.ec2.internal   <none>           <none>

3. check ingresscontroller LB setting, it uses Internal NLB

spec:
  endpointPublishingStrategy:
    loadBalancer:
      dnsManagementPolicy: Managed
      providerParameters:
        aws:
          networkLoadBalancer: {}
          type: NLB
        type: AWS
      scope: Internal
    type: LoadBalancerService

4. continue to curl the route from a pod inside the cluster
$ oc rsh console-operator-86786df488-w6fks
Defaulted container "console-operator" out of: console-operator, conversion-webhook-server

sh-4.4$ curl https://console-openshift-console.apps.rosa.ci-rosa-h-d53b.ptk5.p3.openshiftapps.com -k -I
HTTP/1.1 200 OK

sh-4.4$ curl https://console-openshift-console.apps.rosa.ci-rosa-h-d53b.ptk5.p3.openshiftapps.com -k -I
Connection timed out

Expected results:

1. co/console should be stable, curl console route should be always OK.
2. qe-e2e-test should not fail

Additional info:

qe-e2e-test on the cluster:

https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/pr-logs/pull/openshift_release/45369/rehearse-45369-periodic-ci-openshift-openshift-tests-private-release-4.14-amd64-stable-aws-rosa-sts-hypershift-sec-guest-prod-private-link-full-f2/1724307074235502592

https://github.com/openshift/console-operator/pull/815

Bug OCPBUGS-24359: oc-mirror with v2 will create more data compared with v1 format

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/oc-mirror/pull/762

Bug OCPBUGS-22104: Clicking on an log based alerts redirects to prometheus metrics

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

OCP 4.14

Logging 5.8

How reproducible:

Always

Steps to Reproduce:

Install CLO and Loki operator with log based alerts enabled
Check on observe -> alerts and select a log based alert
Click on the metric displayed in the alert detail page

Actual results:

The user is redirected to Observe -> metrics, and the chart does not display any metrics as they are not stored in prometheus

Expected results:

The user should be redirected to Observe -> Logs, and the metric should be displayed instead of the log list: see ~~OU-267~~

https://github.com/openshift/monitoring-plugin/pull/78

Bug OCPBUGS-21594: mapi_current_pending_csr metric firing when non-mapi CSRs are present

View the Description View the linked PRs

Description of problem:

The MAPI metric mapi_current_pending_csr fires even when there are no pending MAPI CSRs. However, there are non-MAPI CSRs present. It may not be appropriately scoping this metric to only it's CSRs.

Version-Release number of selected component (if applicable):

Observed in 4.11.25

How reproducible:

Consistent

Steps to Reproduce:

1. Install a component that uses CSRs (like ACM) but leave the CSRs in a pending state
2. Observe metric firing
3.

Actual results:

Metric is firing

Expected results:

Metric only fires if there are MAPI specific CSRs pending

Additional info:

This impacts SRE alerting

https://github.com/openshift/cluster-machine-approver/pull/208

Bug OCPBUGS-21823: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/route-controller-manager/pull/32

Bug OCPBUGS-19279: Update 4.15 ose-etcd image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/etcd/pull/215

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/etcd/pull/215

Bug OCPBUGS-19628: nodeip-configuration doesn't log to serial console

View the Description View the linked PRs

Description of problem:

The nodeip-configuration service does not log to the serial console, which makes it difficult to debug problems when networking is not available and there is no access to the node.

Version-Release number of selected component (if applicable):

Reported against 4.13, but present in all releases

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/3927

Bug OCPBUGS-22369: Ccoctl create Azure Workload Identity resource does not work properly in eastus region because the storage account does not allow Public access.

View the Description View the linked PRs

Description of problem:

Default security settings for new Azure Storage accounts be updated. Using ccoctl to create Azure Workload Identity resources in region eastus is not work.

I found several commonly used regions and did the test. The test results are as follows.

List of regions not working properly: eastus

$ az storage account list -g mihuangtt0947-rg-oidc --query "[].[name,allowBlobPublicAccess]" -o tsv
mihuangtt0947rgoidc False


 List of regions working properly: westus, australiacentral, australiaeast, centralus, australiasoutheast, southindia…

$ az storage account list -g mihuangdispri0929-rg-oidc --query "[].[name,allowBlobPublicAccess]" -o tsv
mihuangdispri0929rgoidc	True

Version-Release number of selected component (if applicable):

4.14/4.15

How reproducible:

Always

Steps to Reproduce:

1.Running ccoctl azure create-all command to create azure workload identity resources in region eastus.

[huangmingxia@fedora CCO-bugs]$ ./ccoctl azure create-all  --name 'mihuangp1' --region 'eastus' --subscription-id  {SUBSCRIPTION-ID} --tenant-id {TENANNT-ID} --credentials-requests-dir=./credrequests --dnszone-resource-group-name 'os4-common' --storage-account-name='mihuangp1oidc' --output-dir test

Actual results:

[huangmingxia@fedora CCO-bugs]$  ./ccoctl azure create-all  --name 'mihuangp1' --region 'eastus' --subscription-id  {SUBSCRIPTION-ID} --tenant-id {TENANNT-ID} --credentials-requests-dir=./credrequests --dnszone-resource-group-name 'os4-common' --storage-account-name='mihuangp1oidc' --output-dir test
2023/10/25 11:14:36 Using existing RSA keypair found at test/serviceaccount-signer.private
2023/10/25 11:14:36 Copying signing key for use by installer
2023/10/25 11:14:36 No --oidc-resource-group-name provided, defaulting OIDC resource group name to mihuangp1-oidc
2023/10/25 11:14:36 No --installation-resource-group-name provided, defaulting installation resource group name to mihuangp1
2023/10/25 11:14:36 No --blob-container-name provided, defaulting blob container name to mihuangp1
2023/10/25 11:14:39 Created resource group /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/mihuangp1-oidc
2023/10/25 11:15:01 Created storage account /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/mihuangp1-oidc/providers/Microsoft.Storage/storageAccounts/mihuangp1oidc
2023/10/25 11:15:03 failed to create blob container: PUT https://management.azure.com/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/mihuangp1-oidc/providers/Microsoft.Storage/storageAccounts/mihuangp1oidc/blobServices/default/containers/mihuangp1--------------------------------------------------------------------------------RESPONSE 409: 409 ConflictERROR CODE: PublicAccessNotPermitted--------------------------------------------------------------------------------{  "error": {    "code": "PublicAccessNotPermitted",    "message": "Public access is not permitted on this storage account.\nRequestId:415c51f1-c01e-0017-7ef1-06ec0c000000\nTime:
2023-10-25T03:15:02.7928767Z"  }}--------------------------------------------------------------------------------

$ az storage account list -g mihuangtt0947-rg-oidc --query "[].[name,allowBlobPublicAccess]" -o tsvmihuangtt0947rgoidc False

Expected results:

Resources created successfully.

$ az storage account list -g mihuangtt0947-rg-oidc --query "[].[name,allowBlobPublicAccess]" -o tsv
mihuangtt0947rgoidc True

Additional info:

Google email: Important notice: Default security settings for new Azure Storage accounts will be updated

https://github.com/openshift/cloud-credential-operator/pull/610

Bug OCPBUGS-18113: CPMS failure domains should be omitted when a single failure domain is present

View the Description View the linked PRs

Description of problem:

When the installer generates a CPMS, it should only add the `failureDomains` field when there is more than one failure domain. When there is only one failure domain, the fields from the failure domain, eg the zone, should be injected directly into the provider spec and the failure domain should be omitted.

By doing this, we avoid having to care about failure domain injection logic for single zone clusters. Potentially avoiding bugs (such as some we have seen recently).

IIRC we already did this for OpenStack, but AWS, Azure and GCP may not be affected.

Version-Release number of selected component (if applicable):

How reproducible:

Can be demonstrated on Azure on the westus region which has no AZs available. Currently the installer creates the following, which we can omit entirely:
```
failureDomains:
  platform: Azure
  azure:
  - zone: ""
```

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7448

Bug OCPBUGS-24139: Update 4.15 csi-attacher-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-attacher/pull/65

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-attacher/pull/65

Task MON-3422: Remove temporary no more needed code

View the linked PRs

https://github.com/openshift/cluster-monitoring-operator/pull/2132

Bug OCPBUGS-22528: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-24074: Update 4.15 ose-vsphere-cluster-api-controllers-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-vsphere/pull/26

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-vsphere/pull/26

Bug MGMT-16001: Agent stuck in unbinding-pending-user-action after scaling down an ipv6 hypershift cluster

View the Description View the linked PRs

Description of the problem:

Installed an ipv6 disconnected agent-based hosted cluster and added 3 workers to it using the boot-it-yourself flow. When scaling down the nodepool to 2 replicas, the agent that should be unbound is stuck in unbinding-pending-user-action state:

    state: unbinding-pending-user-action
    stateInfo: Host is waiting to be unbound from the cluster

How reproducible:

100%

Steps to reproduce:

Actual results:

Agent stuck in unbinding-pending-user-action state

Expected results:

Agent reaches known-unbound state

Bug OCPBUGS-15844: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7540

Bug OCPBUGS-22200: Workers fail to join cluster if metadata service is temporarily unavailable on first boot

View the Description View the linked PRs

This was originally reported in AWS (details below), but the OpenStack configuration suffers the same issue. If the metadata query for the instance name fails on initial boot, kubelet will start with an invalid nodename and will fail to come up.

Description of problem:

worker CSR are pending, so no worker nodes available

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-06-234925

How reproducible:

Always

Steps to Reproduce:

Create a cluster with profile - aws-c2s-ipi-disconnected-private-fips

Actual results:

Workers csrs are pending

Expected results:

workers should be up and running all CSRs approved

Additional info:

failed to find machine for node ip-10-143-1-120” , in logs of cluster-machine-approver 

Seems like we should have ips like 
“ip-10-143-1-120.ec2.internal”

failing here - https://github.com/openshift/cluster-machine-approver/blob/master/pkg/controller/csr_check.go#L263

Must-gather - https://drive.google.com/file/d/15tz9TLdTXrH6bSBSfhlIJ1l_nzeFE1R3/view?usp=sharing

cluster - https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/238922/

template for installation - https://gitlab.cee.redhat.com/aosqe/flexy-templates/-/blob/master/functionality-testing/aos-4_14/ipi-on-aws/versioned-installer-customer_vpc-disconnected_private_cluster-fips-c2s-ci

cc Yunfei Jiang Zhaohua Sun

https://github.com/openshift/machine-config-operator/pull/3990

Task MON-3533: Remove e2e plugins from cluster:kube_persistentvolume_plugin_type_counts:sum

View the Description View the linked PRs

Similar to what has been done in ~~MON-3484~~.

https://github.com/openshift/cluster-monitoring-operator/pull/2171

Bug OCPBUGS-21781: [gcp] please clarify what's wrong with the userLabel key "a"

View the Description View the linked PRs

Description of problem:

setting key beging "a" for platform.gcp.userLabels got error message which doesn't explain what's wrong exactly

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-15-164249

How reproducible:

Always

Steps to Reproduce:

1. "create install-config"
2. edit the install-config.yaml to insert userLabels settings (see [1])
3. "create cluster"

Actual results:

Error message shows up telling the label key "a" is invalid.

Expected results:

There should be no error, according to the statement "A label key can have a maximum of 63 characters and cannot be empty. Label must begin with a lowercase letter, and must contain only lowercase letters, numeric characters, and the following special characters `_-`".

Additional info:

$ openshift-install version
openshift-install 4.14.0-0.nightly-2023-10-15-164249
built from commit 359866f9f6d8c86e566b0aea7506dad22f59d860
release image registry.ci.openshift.org/ocp/release@sha256:3c5976a39479e11395334f1705dbd3b56580cd1dcbd514a34d9c796b0a0d9f8e
release architecture amd64
$ openshift-install explain installconfig.platform.gcp.userLabels
KIND:     InstallConfig
VERSION:  v1

RESOURCE: <[]object>
  userLabels has additional keys and values that the installer will add as labels to all resources that it creates on GCP. Resources created by the cluster itself may not include these labels. This is a TechPreview feature and requires setting CustomNoUpgrade featureSet with GCPLabelsTags featureGate enabled or TechPreviewNoUpgrade featureSet to configure labels.

FIELDS:
    key <string> -required-
      key is the key part of the label. A label key can have a maximum of 63 characters and cannot be empty. Label must begin with a lowercase letter, and must contain only lowercase letters, numeric characters, and the following special characters `_-`.    value <string> -required-
      value is the value part of the label. A label value can have a maximum of 63 characters and cannot be empty. Value must contain only lowercase letters, numeric characters, and the following special characters `_-`.

$ 

[1]
$ yq-3.3.0 r test12/install-config.yaml platform
gcp:
  projectID: openshift-qe
  region: us-central1
  userLabels:
  - key: createdby
    value: installer-qe
  - key: a
    value: hello
$ yq-3.3.0 r test12/install-config.yaml featureSet
TechPreviewNoUpgrade
$ yq-3.3.0 r test12/install-config.yaml credentialsMode
Passthrough
$ openshift-install create cluster --dir test12
ERROR failed to fetch Metadata: failed to load asset "Install Config": failed to create install config: invalid "install-config.yaml" file: platform.gcp.userLabels[a]: Invalid value: "hello": label key is invalid or contains invalid characters. Label key can have a maximum of 63 characters and cannot be empty. Label key must begin with a lowercase letter, and must contain only lowercase letters, numeric characters, and the following special characters `_-` 
$

Bug OCPBUGS-25700: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/1994

Bug OCPBUGS-24151: Update 4.15 ose-machine-api-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-operator/pull/1179

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-operator/pull/1179

Bug OCPBUGS-16905: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/3895

Bug OCPBUGS-18396: CI: MTU migraton failures in 4.14

View the Description View the linked PRs

CI is almost perma failing on mtu migration in 4.14 (both SDN and OVN-Kubernetes):

https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-cluster-network-operator-master-e2e-network-mtu-migration-sdn-ipv4

https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-cluster-network-operator-master-e2e-network-mtu-migration-ovn-ipv4

Looks like the common issue is waiting for MCO times out:

+ echo '[2023-08-31T03:58:16+00:00] Waiting for final Machine Controller Config...'
[2023-08-31T03:58:16+00:00] Waiting for final Machine Controller Config...
+ timeout 900s bash
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO 
...

https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/1979/pull-ci-openshift-cluster-network-operator-master-e2e-network-mtu-migration-sdn-ipv4/1697077984948654080/build-log.txt

https://github.com/openshift/cluster-network-operator/pull/2021

Bug OCPBUGS-14257: coreos-installer iso kargs show broken on Agent ISO

View the Description View the linked PRs

Running the command coreos-installer iso kargs show no longer works with the 4.13 Agent ISO. Instead we get this error:

$ coreos-installer iso kargs show agent.x86_64.iso
Writing manifest to image destination
Storing signatures
Error: No karg embed areas found; old or corrupted CoreOS ISO image.

This is almost certainly due to the way we repack the ISO as part of embedding the agent-tui binary in it.

It worked fine in 4.12. I have tested both with every version of coreos-installer from 0.14 to 0.17

https://github.com/openshift/installer/pull/7896

Bug OCPBUGS-20517: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-catalogd/pull/29

Bug OCPBUGS-18247: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1542

Bug OCPBUGS-24956: Installer should have a pre-check which prevents installation on non-BareMetal platforms without the CloudCredential cap

View the Description View the linked PRs

The Cloud Credential operator was made optional in OCP 4.15, see https://issues.redhat.com/browse/OCPEDGE-69. The CloudCredential cap was added as a new capability.

However, for OCP 4.15 the disablement of CCO is only supported on BareMetal platforms, see https://issues.redhat.com/browse/OCPEDGE-69?focusedId=23595076&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-23595076.

We propose to guard against installations on non-BareMetal platforms without the CloudCredential cap, which could be implemented similar to https://issues.redhat.com/browse/OCPBUGS-15659. 

Bug OCPBUGS-9157: ‘Create Pod’ button should be disabled for normal user without any projects on pods list page

View the Description View the linked PRs

Description of problem:
An error message 'Restricted Access' and an 'Create Pod' button would be shown on Pods's page for a normal user without any project

Version-Release number of selected component (if applicable):
4.11.0-0.nightly-2022-03-04-063157

How reproducible:
Always

Steps to Reproduce:
1. Login in OCP with a normal user, navigate to Pods page
2. Check if 'No Pods found' message will be shown on page, and the 'Create Pod' button will be hidden
3.

Actual results:
2. An error message 'Restricted Access' and an enabled 'Create Pod' button would be shown on pod's page

Expected results:
2. Should show ‘No Pods found’ message
Hide 'Create Pod' button

Additional info:
The same behavior can be checked on 'Deployment, Stateful Set, Job, Service' page which is correct

https://github.com/openshift/console/pull/13040

Bug OCPBUGS-22107: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-ibmcloud/pull/61

Bug OCPBUGS-25864: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2180

Task HOSTEDCP-1306: Bump Golang to v1.20 in Containerfile.operator for RHTAP

View the Description View the linked PRs

Bump Golang to v1.20 in Containerfile.operator for RHTAP

https://github.com/openshift/hypershift/pull/3196

Bug OCPBUGS-24122: Update 4.15 ose-alibaba-cloud-csi-driver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/alibaba-cloud-csi-driver/pull/42

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/alibaba-cloud-csi-driver/pull/42

Bug OCPBUGS-18115: PrometheusOperatorRejectedResources alert fires on Hypershift clusters with user-defined monitoring

View the Description View the linked PRs

Description of problem:

After enabling user-defined monitoring on an HyperShift hosted cluster, PrometheusOperatorRejectedResources starts firing.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Start an hypershift-hosted cluster with cluster-bot
2. Enable user-defined monitoring
3.

Actual results:

PrometheusOperatorRejectedResources alert becomes firing

Expected results:

No alert firing

Additional info:

Need to reach out to the HyperShift folks as the fix should probably be in their code base.

https://github.com/openshift/cluster-openshift-apiserver-operator/pull/551

Bug OCPBUGS-19987: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-etcd-operator/pull/1134

Bug OCPBUGS-20407: Builder fails to expose repository secrets for RUN

View the Description View the linked PRs

Description of problem:

When setting up transient mounts, which are used for exposing CA certificates and RPM package repositories to a build, a recent change we made in the builder attempted to replace simple bind mounts with overlay mounts.  While this might have made things easier for unprivileged builds, we overlooked that overlay mounts can't be made to files, only directories, so we need to revert the change.

Version-Release number of selected component (if applicable):

4.14.0

How reproducible:

Always

Steps to Reproduce:

Per https://redhat-internal.slack.com/archives/C014MHHKUSF/p1696882408656359?thread_ts=1696882334.352129&cid=C014MHHKUSF,
1. oc new-app - l app=pvg-nodejs --name pvg-nodejs pvg-nodejs https://github.com/openshift/nodejs-ex.git

Actual results:

mount /var/lib/containers/storage/overlay-containers/9c3877f3062cc18b01f30db310e0e2bd0a1cd4527d74f41c313399e48fa81d23/userdata/overlay/145259665/merge:/run/secrets/redhat.repo (via /proc/self/fd/6), data: lowerdir=/tmp/redhat.repo-copy2014834134/redhat.repo,upperdir=/var/lib/containers/storage/overlay-containers/9c3877f3062cc18b01f30db310e0e2bd0a1cd4527d74f41c313399e48fa81d23/userdata/overlay/145259665/upper,workdir=/var/lib/containers/storage/overlay-containers/9c3877f3062cc18b01f30db310e0e2bd0a1cd4527d74f41c313399e48fa81d23/userdata/overlay/145259665/work: *invalid argument*"

Expected results:

Successful setup for a transient mount to the redhat.repo file for a RUN instruction.

Additional info:

Bug introduced in https://github.com/openshift/builder/pull/349, should be fixed in https://github.com/openshift/builder/pull/359.

https://github.com/openshift/builder/pull/359

Bug OCPBUGS-24150: Update 4.15 ose-cluster-capi-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-capi-operator/pull/147

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-capi-operator/pull/147

Bug OCPBUGS-13348: Hypershift Audit configuration not working for Hypershift HostedCluster

View the Description View the linked PRs

Description of problem:

Add Audit configuration for hypershift Hosted Cluster not working as expected.

Version-Release number of selected component (if applicable):

# oc get clusterversions.config.openshift.io
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.0-0.nightly-2023-05-04-090524   True        False         15m     Cluster version is 4.13.0-0.nightly-2023-05-04-090524

How reproducible:

Always

Steps to Reproduce:

1. Get hypershift hosted cluster detail from management cluster. 

# hostedcluster=$( oc get -n clusters hostedclusters -o json | jq -r .items[].metadata.name)  

2. Apply audit profile for hypershift hosted cluster. 
# oc patch HostedCluster $hostedcluster -n clusters -p '{"spec": {"configuration": {"apiServer": {"audit": {"profile": "WriteRequestBodies"}}}}}' --type merge     
hostedcluster.hypershift.openshift.io/85ea85757a5a14355124 patched 

# oc get HostedCluster $hostedcluster -n clusters -ojson | jq .spec.configuration.apiServer.audit        
{
  "profile": "WriteRequestBodies"
}

3. Check Pod or operator restart to apply configuration changes. 

# oc get pods -l app=kube-apiserver  -n clusters-${hostedcluster}
NAME                              READY   STATUS    RESTARTS   AGE
kube-apiserver-7c98b66949-9z6rw   5/5     Running   0          36m
kube-apiserver-7c98b66949-gp5rx   5/5     Running   0          36m
kube-apiserver-7c98b66949-wmk8x   5/5     Running   0          36m

# oc get pods -l app=openshift-apiserver   -n clusters-${hostedcluster}
NAME                                  READY   STATUS    RESTARTS   AGE
openshift-apiserver-dc4c84ff4-566z9   3/3     Running   0          29m
openshift-apiserver-dc4c84ff4-99zq9   3/3     Running   0          29m
openshift-apiserver-dc4c84ff4-9xdrz   3/3     Running   0          30m

4. Check generated audit log.
# NOW=$(date -u "+%s"); echo "$NOW"; echo "$NOW" > now
1683711189

# kaspod=$(oc get pods -l app=kube-apiserver -n clusters-${hostedcluster} --no-headers -o=jsonpath={.items[0].metadata.name})                                     

# oc logs $kaspod -c audit-logs -n clusters-${hostedcluster} > kas-audit.log                                                                                      
# cat kas-audit.log | grep -iE '"verb":"(get|list|watch)","user":.*(requestObject|responseObject)' | jq -c 'select (.requestReceivedTimestamp | .[0:19] + "Z" | fromdateiso8601 > '"`cat now`)" | wc -l
0

# cat kas-audit.log | grep -iE '"verb":"(create|delete|patch|update)","user":.*(requestObject|responseObject)' | jq -c 'select (.requestReceivedTimestamp | .[0:19] + "Z" | fromdateiso8601 > '"`cat now`)" | wc -l
0  

All results should not be zero
In backend it should apply the configuration or pod/operator restart after configuration changes.

Actual results:

Config changes not applied in backend.Not operator & pod restart

Expected results:

Configuration should applied and pod & operator should restart after config changes.

Additional info:

https://github.com/openshift/hypershift/pull/3014

Bug OCPBUGS-18860: Update 4.15 openshift-enterprise-base-rhel9 image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/images/pull/149

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/images/pull/149

Bug OCPBUGS-19708: MCO does not create duplicated kernel arguments

View the Description View the linked PRs

Description of problem:

When we create a MC that declares the same kernel argument twice, MCO is adding it only once.

Version-Release number of selected component (if applicable):

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.0-0.nightly-2023-09-22-181920   True        False         5h18m   Cluster version is 4.12.0-0.nightly-2023-09-22-181920

We have seen this behavior in 4.15 too 4.15.0-0.nightly-2023-09-22-224720

How reproducible:

Always

Steps to Reproduce:

1. Create a MC that declares 2 kernel arguments with the same value (z=4 is duplicated)

 apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: test-kernel-arguments-32-zparam
spec:
  config:
    ignition:
      version: 3.2.0
  kernelArguments:
    - y=0
    - z=4
    - y=1
    - z=4

Actual results:

We get the following parameters

$ oc debug -q node/sergio-v12-9vwrc-worker-c-tpbvh.c.openshift-qe.internal  -- chroot /host cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-a594b3a14778ce39f2b42ddb90e933c1971268a746ef1678a3c6eedee5a21b00/vmlinuz-4.18.0-372.73.1.el8_6.x86_64 ostree=/ostree/boot.0/rhcos/a594b3a14778ce39f2b42ddb90e933c1971268a746ef1678a3c6eedee5a21b00/0 ignition.platform.id=gcp console=ttyS0,115200n8 root=UUID=e101e976-e029-411d-ad71-6856f3838c4f rw rootflags=prjquota boot=UUID=75598fe5-c10d-4e95-9747-1708d9fe6a10 console=tty0 y=0 z=4 y=1

There is only one "z=4" parameter. We should see "y=0 z=4 y=1 z=4" instead of "y=0 z=4 y=1"

Expected results:

In older versions we can see that the duplicated parameters are created

For example, this is the output in a IPI on AWS 4.9 cluster

$ oc debug -q node/ip-10-0-189-69.us-east-2.compute.internal -- chroot /host cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-e1eeff6ec1b9b70a3554779947906f4a7fb93e0d79fbefcb045da550b7d9227f/vmlinuz-4.18.0-305.97.1.el8_4.x86_64 random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ostree=/ostree/boot.1/rhcos/e1eeff6ec1b9b70a3554779947906f4a7fb93e0d79fbefcb045da550b7d9227f/0 ignition.platform.id=aws root=UUID=ed307195-b5a9-4160-8a7a-df42aa734c28 rw rootflags=prjquota y=0 z=4 y=1 z=4


All the parameters are created, including the duplicated "z=4".

Additional info:

https://github.com/openshift/machine-config-operator/pull/3947

Bug OCPBUGS-19736: After Upgrade to 4.12 rebooted nodes no longer boot

View the Description View the linked PRs

Description of problem:

configure-ovs.sh breaks primary interface config by leaving generated configs in '/etc/NetworkManager/system-connections`

Version-Release number of selected component (if applicable):

4.10.52 -> 4.11.46 -> OCP 4.12.27 IPI VSphere

How reproducible:

reboot any node, the node will never become ready.

Steps to Reproduce:

1. Install and upgrade cluster
2. Reboot worker nodes after upgrade.
3.

Actual results:

Primary interface never sends DHCP and bad configs in /etc/NetworkManager/system-connections

Expected results:

No left over ovs-configure configs, and primary interface aquires IP Address using DHCP.

Additional info:

Workaround Only when using a single DHCP interface.
rm /etc/NetworkManager/system-connections/*

https://github.com/openshift/machine-config-operator/pull/3982

Bug OCPBUGS-21593: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-9719: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/10177

Task MON-3548: Bump KSM to v2.10.1

View the Description View the linked PRs

Bump KSM to the latest v2.10.1 release that addresses a regression in the previous upstream release as well as builds with a newer Golang patch version (v1.20.8).

Bug OCPBUGS-24104: Update 4.15 openshift-enterprise-console-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/console-operator/pull/818

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/console-operator/pull/818

Bug OCPBUGS-16920: [ibm-vpc-block-csi-driver] xfs volume snapshot volume mount failed of "Filesystem has duplicate UUID"

View the Description View the linked PRs

Description of problem:

[ibm-vpc-block-csi-driver] xfs volume snapshot volume mount failed of "Filesystem has duplicate UUID"

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-26-132453

How reproducible:

Always

Steps to Reproduce:

1. Install an openshift cluster on ibmcloud;
2. Create a pvc with the ibm-vpc-block csi storageclass and one pod consume the pvc;
3. Write some data to the pod's volume and sync;
4. Create a volumesnapshot and wait it ReadyToUse;
5. Create a pvc restore the volumesnapshot and create one pod consume the restored pvc;

Actual results:

In step5: the volume mount failed of 
07-27 21:36:08.572    Mounting command: mount
07-27 21:36:08.572    Mounting arguments: -t xfs -o defaults /dev/disk/by-id/virtio-0787-6ec22828-ec32-4 /var/lib/kubelet/plugins/kubernetes.io/csi/vpc.block.csi.ibm.io/ecef50d905ba489935099cad29a3773220fec45334e7546951706454894073e7/globalmount
07-27 21:36:08.572    Output: mount: /var/lib/kubelet/plugins/kubernetes.io/csi/vpc.block.csi.ibm.io/ecef50d905ba489935099cad29a3773220fec45334e7546951706454894073e7/globalmount: wrong fs type, bad option, bad superblock on /dev/vde, missing codepage or helper program, or other error.

Check the dmesg ->
[14530.520622] XFS (vde): Filesystem has duplicate UUID a758102c-fbdd-41ef-b4de-60a546bf554b - can't mount
[14531.119703] XFS (vde): Filesystem has duplicate UUID a758102c-fbdd-41ef-b4de-60a546bf554b - can't mount
[14532.229388] XFS (vde): Filesystem has duplicate UUID a758102c-fbdd-41ef-b4de-60a546bf554b - can't mount
[14534.348809] XFS (vde): Filesystem has duplicate UUID a758102c-fbdd-41ef-b4de-60a546bf554b - can't mount
[14538.396705] XFS (vde): Filesystem has duplicate UUID a758102c-fbdd-41ef-b4de-60a546bf554b - can't mount
[14546.472831] XFS (vde): Filesystem has duplicate UUID a758102c-fbdd-41ef-b4de-60a546bf554b - can't mount
[14562.523028] XFS (vde): Filesystem has duplicate UUID a758102c-fbdd-41ef-b4de-60a546bf554b - can't mount
[14594.636819] XFS (vde): Filesystem has duplicate UUID a758102c-fbdd-41ef-b4de-60a546bf554b - can't mount
[14658.749442] XFS (vde): Filesystem has duplicate UUID a758102c-fbdd-41ef-b4de-60a546bf554b - can't mount
[14780.863678] XFS (vde): Filesystem has duplicate UUID a758102c-fbdd-41ef-b4de-60a546bf554b - can't mount

Expected results:

In step5: the restored volume should mount successfully and the pod become Running

Additional info:

looks like a bug in the CSI driver, it mount without `-o nouuid`

https://github.com/openshift/ibm-vpc-block-csi-driver/pull/45

Bug OCPBUGS-18105: [IBM VPC] failed provisioning volume in proxy cluster

View the Description View the linked PRs

Description of problem:

IBM VPC CSI Driver failed to provisioning volume in proxy cluster, (if I understand correctly) it seems the proxy in not injected because in our definition (https://github.com/openshift/ibm-vpc-block-csi-driver-operator/blob/master/assets/controller.yaml), we are injecting proxy to csi-driver:
    config.openshift.io/inject-proxy: csi-driver
    config.openshift.io/inject-proxy-cabundle: csi-driver
but the container name is iks-vpc-block-driver in https://github.com/openshift/ibm-vpc-block-csi-driver-operator/blob/master/assets/controller.yaml#L153

I checked the proxy in not defined in controller pod or driver container ENV.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-11-055332

How reproducible:

Always

Steps to Reproduce:

1. Create IBM cluster with proxy setting
2. create pvc/pod with IBM VPC CSI Driver

Actual results:

It failed to provisioning volume

Expected results:

Provisioning volume works well on proxy cluster

Additional info:

https://github.com/openshift/ibm-vpc-block-csi-driver/pull/43

Bug OCPBUGS-19966: Builds - BuildConfigs : i18n misses

View the Description View the linked PRs

Description of problem:

Change UI to non en_US locale.
Navigate to Builds - BuildConfigs
Click on kebabmenu, 'Start last run' is in English

Version-Release number of selected component (if applicable):

4.14.0-rc.2

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Content is in English

Expected results:

Content should be localized

Additional info:

Reference screenshot https://drive.google.com/file/d/1XrQwpJxftcsvE8rPGvItTaCZ4Sr1Rj1l/view?usp=sharing

https://github.com/openshift/console/pull/13211

Bug OCPBUGS-20070: CPMSO: Unsupported GCP e2-custom-* instance type in E2E test framework

View the Description View the linked PRs

Description of problem:

GCP e2-custom-* instance type is not supported by our E2E test framework.
Now that testplatform have started using those instance types, we are seeing permafailing E2E job runs on our CPMS E2E periodic tests.

Error sample:

• [FAILED] [285.539 seconds]475ControlPlaneMachineSet Operator With an active ControlPlaneMachineSet and the instance type is changed [BeforeEach] should perform a rolling update [Periodic]476  [BeforeEach] /go/src/github.com/openshift/cluster-control-plane-machine-set-operator/test/e2e/periodic_test.go:39477  [It] /go/src/github.com/openshift/cluster-control-plane-machine-set-operator/test/e2e/periodic_test.go:43478479  [FAILED] provider spec should be updated with bigger instance size480  Expected success, but got an error:481      <*fmt.wrapError | 0xc000358380>: 482      failed to get next instance size: instance type did not match expected format: e2-custom-6-16384483      {484          msg: "failed to get next instance size: instance type did not match expected format: e2-custom-6-16384",485          err: <*fmt.wrapError | 0xc000358360>{486              msg: "instance type did not match expected format: e2-custom-6-16384",487              err: <*errors.errorString | 0xc0001489f0>{488                  s: "instance type did not match expected format",489              },490          },491      }

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Use e2-custom in GCP in a cluster, run CPMSO E2E periodics
2.
3.

Actual results:

Permafailing E2Es

Expected results:

Successful E2Es

Additional info:

Bug OCPBUGS-16080: File /var/log/kube-apiserver/termination.log for kube-apiserver has too permissive mode

View the Description View the linked PRs

Description of problem:

All files under path /var/log/kube-apiserver/ should have 600 permission. File /var/log/kube-apiserver/termination.log for kube-apiserver on some nodes have 644 permission.
$ for node in `oc get node -l node-role.kubernetes.io/control-plane= --no-headers|awk '{print $1}'`;do oc debug node/$node -- chroot /host ls -l /var/log/kube-apiserver/;done
Temporary namespace openshift-debug-gj262 is created for debugging node...
Starting pod/ip-x-us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
total 221752
-rw-------. 1 root root 209714718 Jul 12 05:47 audit-2023-07-12T05-47-16.625.log
-rw-------. 1 root root  13233368 Jul 12 05:54 audit.log
-rw-------. 1 root root    646569 Jul 12 04:19 termination.logRemoving debug pod ...
Temporary namespace openshift-debug-gj262 was removed.
Temporary namespace openshift-debug-cmdgm is created for debugging node...
Starting pod/ip-xus-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
total 49640
-rw-------. 1 root root 49826363 Jul 12 05:54 audit.log
-rw-------. 1 root root   826226 Jul 12 04:23 termination.logRemoving debug pod ...
Temporary namespace openshift-debug-cmdgm was removed.
Temporary namespace openshift-debug-fdqtv is created for debugging node...
Starting pod/ip-xus-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
total 270276
-rw-------. 1 root root 209714252 Jul 12 05:34 audit-2023-07-12T05-34-34.205.log
-rw-------. 1 root root  51250736 Jul 12 05:54 audit.log
-rw-r--r--. 1 root root         4 Jul 12 04:15 termination.logRemoving debug pod ...
Temporary namespace openshift-debug-fdqtv was removed.
$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.nightly-2023-07-11-092038   True        False         91m     Cluster version is 4.14.0-0.nightly-2023-07-11-092038

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-11-092038

How reproducible:

Always

Steps to Reproduce:

1.$ for node in `oc get node -l node-role.kubernetes.io/control-plane= --no-headers|awk '{print $1}'`;do oc debug node/$node -- chroot /host ls -l /var/log/kube-apiserver/;done 2.
3.

Actual results:

File /var/log/kube-apiserver/termination.log for kube-apiserver on some nodes has 644 permission.

Expected results:

All files under path /var/log/kube-apiserver/ should have 600 permission.

Additional info:

https://github.com/openshift/kubernetes/pull/1638

Bug OCPBUGS-23376: OCP 4.14 IPI on Vsphere fails with "network '/Datacenter/network' not found" error

View the Description View the linked PRs

Description of problem:

I have a customer trying to deploy 4.14.1 IPI on vsphere and running into:

time="2023-11-14T14:30:35+01:00" level=fatal msg="failed to fetch Terraform Variables: failed to generate asset \"Terraform Variables\": network '/Datacenter_name/VLAN2506' not found

A similar configuration works fine with OCP 4.13

The network profile VLAN2506is available in the given network list of installer survey.

The network is available inside '/datacenter/network/VLAN2506' when checked with govc command.

Found this https://bugzilla.redhat.com/show_bug.cgi?id=2063829 however it was reported when the network is nested under a folder however here the network is inside DC.

We tried this with 4,14 installer in our lab env however did not face this issue.

Version-Release number of selected component (if applicable):
4.14.1

https://github.com/openshift/installer/pull/7737

Bug OCPBUGS-20331: previously disabled cluster capability Console unintentionally enabled during an upgrade

View the Description View the linked PRs

Description of problem:

a 4.13 cluster installed with
baselineCapabilitySet: None
additionalEnabledCapabilities: ['NodeTuning', 'CSISnapshot']

an upgrade to 4.14 causing a previously disabled Console to became ImplicitlyEnabled (in contrast with newly added 4.14 capabilities that are expected to be enabled implicitly in this case)

'ImplicitlyEnabledCapabilities'
{
  "lastTransitionTime": "2023-10-09T19:08:29Z",
  "message": "The following capabilities could not be disabled: Console, ImageRegistry, MachineAPI",
  "reason": "CapabilitiesImplicitlyEnabled",
  "status": "True",
  "type": "ImplicitlyEnabledCapabilities"
}

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-08-220853

How reproducible:

100%

Steps to Reproduce:

as described above

Additional info:

the root cause appears to be https://github.com/openshift/cluster-kube-apiserver-operator/pull/1542

more info in https://redhat-internal.slack.com/archives/CB48XQ4KZ/p1696940380413289

Bug OCPBUGS-21636: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/telemeter/pull/483

Bug OCPBUGS-21915: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/multus-cni/pull/193

Bug OCPBUGS-19126: Update 4.15 ose-cluster-dns-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-dns-operator/pull/380

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-dns-operator/pull/380

Bug OCPBUGS-19201: Update 4.15 ose-alibaba-machine-controllers image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-alibaba/pull/44

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-alibaba/pull/44

Bug OCPBUGS-19992: Missing v6-primary logic on VSphere UPI

View the Description View the linked PRs

Description of problem:

We are missing the new logic for handling v6-primary in the VSphere UPI nodeip-configuration service: https://github.com/openshift/machine-config-operator/blob/ea88304dd6de521d55a9d3413a764f618af2425a/templates/common/vsphere/units/nodeip-configuration-vsphere-upi.service.yaml#L40

https://github.com/openshift/machine-config-operator/pull/3670 addresses that, but unfortunately did not make 4.14 so we will need to backport it.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/3670

Bug OCPBUGS-22043: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-gcp/pull/46

Bug OCPBUGS-25724: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/origin/pull/28483

Bug OCPBUGS-17850: common user can view UWM alertmanager alerts

View the Description View the linked PRs

Description of problem:

enable UWM and enable UWM alertmanager

$ oc -n openshift-monitoring get cm cluster-monitoring-config -oyaml
apiVersion: v1
data:
  config.yaml: |
    enableUserWorkload: true
kind: ConfigMap
metadata:
  creationTimestamp: "2023-08-17T06:02:36Z"
  name: cluster-monitoring-config
  namespace: openshift-monitoring
  resourceVersion: "259151"
  uid: a9365c21-5c1d-4c91-98ee-f074b023dd31

$ oc -n openshift-user-workload-monitoring get cm user-workload-monitoring-config -oyaml
apiVersion: v1
data:
  config.yaml: |
    alertmanager:
      enabled: true
kind: ConfigMap
metadata:
  creationTimestamp: "2023-08-17T06:02:44Z"
  labels:
    app.kubernetes.io/managed-by: cluster-monitoring-operator
    app.kubernetes.io/part-of: openshift-monitoring
  name: user-workload-monitoring-config
  namespace: openshift-user-workload-monitoring
  resourceVersion: "148193"
  uid: b3c6e5a6-ff7b-4ae4-85eb-28be683119e4

$ oc -n openshift-user-workload-monitoring get pod
NAME                                   READY   STATUS    RESTARTS   AGE
alertmanager-user-workload-0           6/6     Running   0          4h50m
alertmanager-user-workload-1           6/6     Running   0          4h50m
prometheus-operator-77bcdcbd9c-7nt6v   2/2     Running   0          6h14m
prometheus-user-workload-0             6/6     Running   0          6h14m
prometheus-user-workload-1             6/6     Running   0          6h14m
thanos-ruler-user-workload-0           4/4     Running   0          4h50m
thanos-ruler-user-workload-1           4/4     Running   0          4h50m

kubeadmin user create namespace and PrometheusRule, the alert could be fired

apiVersion: v1
kind: Namespace
metadata:
  name: ns1
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: example-alert
  namespace: ns1
spec:
  groups:
  - name: example
    rules:
    - alert: TestAlert
      expr: vector(1)
      labels:
        severity: none
      annotations:
        message: This is an alert meant to ensure that the entire alerting pipeline is functional.

could see the alerts from UWM alertmanager

$ token=`oc create token prometheus-k8s -n openshift-monitoring`
$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://alertmanager-user-workload.openshift-user-workload-monitoring.svc:9095/api/v2/alerts' | jq
[
  {
    "annotations": {
      "message": "This is an alert meant to ensure that the entire alerting pipeline is functional."
    },
    "endsAt": "2023-08-17T12:08:41.558Z",
    "fingerprint": "348490d73f8513a0",
    "receivers": [
      {
        "name": "Default"
      }
    ],
    "startsAt": "2023-08-17T12:04:11.558Z",
    "status": {
      "inhibitedBy": [],
      "silencedBy": [],
      "state": "active"
    },
    "updatedAt": "2023-08-17T12:04:41.583Z",
    "generatorURL": "https://thanos-querier-openshift-monitoring.apps.***/api/graph?g0.expr=vector%281%29&g0.tab=1",
    "labels": {
      "alertname": "TestAlert",
      "namespace": "ns1",
      "severity": "none"
    }
  }
]

open another terminal, or another person execute following commands in his terminal

##### login with common user, deploy pod to project is only for we can use curl command
# oc login https://${api_server}:6443 -u ${user} -p ${password}
# oc new-project test
# oc -n test new-app rails-postgresql-example
# oc -n test get pod
NAME                                  READY   STATUS      RESTARTS   AGE
postgresql-1-deploy                   0/1     Completed   0          13m
postgresql-1-v4lz5                    1/1     Running     0          13m
rails-postgresql-example-1-build      0/1     Completed   0          13m
rails-postgresql-example-1-crdbq      1/1     Running     0          9m20s
rails-postgresql-example-1-deploy     0/1     Completed   0          9m42s
rails-postgresql-example-1-hook-pre   0/1     Completed   0          9m39s
# token=`oc whoami -t`
# echo $token
sha256~EJCVjflM6lbsl8plKkU7Hv0swkQMxySJr5BGXRJaKhU

user could see the alert from UWM alertmanager service

# oc -n test exec postgresql-1-v4lz5 -- curl -k -H "Authorization: Bearer $token" 'https://alertmanager-user-workload.openshift-user-workload-monitoring.svc:9095/api/v2/alerts'  | jq
[
  {
    "annotations": {
      "message": "This is an alert meant to ensure that the entire alerting pipeline is functional."
    },
    "endsAt": "2023-08-17T12:16:56.558Z",
    "fingerprint": "348490d73f8513a0",
    "receivers": [
      {
        "name": "Default"
      }
    ],
    "startsAt": "2023-08-17T12:04:11.558Z",
    "status": {
      "inhibitedBy": [],
      "silencedBy": [],
      "state": "active"
    },
    "updatedAt": "2023-08-17T12:12:56.563Z",
    "generatorURL": "https://thanos-querier-openshift-monitoring.apps.***/api/graph?g0.expr=vector%281%29&g0.tab=1",
    "labels": {
      "alertname": "TestAlert",
      "namespace": "ns1",
      "severity": "none"
    }
  }
]

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-16-114741

How reproducible:

always

Steps to Reproduce:

1. see the description

Actual results:

common user can view UWM alertmanager alerts

Expected results:

Additional info:

if this is expected, we could close the bug

https://github.com/openshift/cluster-monitoring-operator/pull/2099

Bug OCPBUGS-19180: Update 4.15 ose-ibm-cloud-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-ibm/pull/53

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-ibm/pull/53

Bug OCPBUGS-24375: oc process command fails while running it with a template file

View the Description View the linked PRs

Description of problem:

oc process command fails while running it with a template file

Version-Release number of selected component (if applicable):

4.12.41

How reproducible:

100%

Steps to Reproduce:

1. Create a new project and a template file 
$ oc new-project test
$ oc get template httpd-example -n openshift -o yaml > /tmp/template_http.yaml 

2. Run oc process command as given below
$ oc process -f /tmp/template_http.yaml 
error: unable to process template: the namespace of the provided object does not match the namespace sent on the request

3. When we run this command as a template from other namespace it runs fine.
$ oc process openshift//httpd-example

4. $ oc version
Client Version: 4.12.41
Kustomize Version: v4.5.7
Server Version: 4.12.42
Kubernetes Version: v1.25.14+bcb9a60

Actual results:

$ oc process -f /tmp/template_http.yaml
error: unable to process template: the namespace of the provided object does not match the namespace sent on the request

Expected results:

Command should display the output of resources it will create

Additional info:

https://github.com/openshift/oc/pull/1612

Bug OCPBUGS-27001: Cannot change default network type when not doing migration

View the Description View the linked PRs

This is a clone of issue OCPBUGS-25760. The following is the description of the original issue:
—
Description of problem:

During live OVN migration, network operator show the error message: Not applying unsafe configuration change: invalid configuration: [cannot change default network type when not doing migration]. Use 'oc edit network.operator.openshift.io cluster' to undo the change.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Create 4.15 nightly SDN ROSA cluster
2. oc delete validatingwebhookconfigurations.admissionregistration.k8s.io/sre-techpreviewnoupgrade-validation
3. oc edit featuregate cluster to enable featuregates 
4. Wait for all node rebooting and back to normal
5. oc patch Network.config.openshift.io cluster --type='merge' --patch '{"metadata":{"annotations":{"network.openshift.io/network-type-migration":""}},"spec":{"networkType":"OVNKubernetes"}}'

Actual results:

[weliang@weliang ~]$ oc delete validatingwebhookconfigurations.admissionregistration.k8s.io/sre-techpreviewnoupgrade-validation[weliang@weliang ~]$ oc edit featuregate cluster[weliang@weliang ~]$ oc patch Network.config.openshift.io cluster --type='merge' --patch '{"metadata":{"annotations":{"network.openshift.io/network-type-migration":""}},"spec":{"networkType":"OVNKubernetes"}}'network.config.openshift.io/cluster patched[weliang@weliang ~]$ [weliang@weliang ~]$ oc get co networkNAME      VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGEnetwork   4.15.0-0.nightly-2023-12-18-220750   True        False         True       105m    Not applying unsafe configuration change: invalid configuration: [cannot change default network type when not doing migration]. Use 'oc edit network.operator.openshift.io cluster' to undo the change.[weliang@weliang ~]$ oc describe Network.config.openshift.io clusterName:         clusterNamespace:    Labels:       <none>Annotations:  network.openshift.io/network-type-migration: API Version:  config.openshift.io/v1Kind:         NetworkMetadata:  Creation Timestamp:  2023-12-20T15:13:39Z  Generation:          3  Resource Version:    119899  UID:                 6a621b88-ac4f-4918-a7f6-98dba7df222cSpec:  Cluster Network:    Cidr:         10.128.0.0/14    Host Prefix:  23  External IP:    Policy:  Network Type:  OVNKubernetes  Service Network:    172.30.0.0/16Status:  Cluster Network:    Cidr:               10.128.0.0/14    Host Prefix:        23  Cluster Network MTU:  8951  Network Type:         OpenShiftSDN  Service Network:    172.30.0.0/16Events:  <none>[weliang@weliang ~]$ oc describe Network.operator.openshift.io clusterName:         clusterNamespace:    Labels:       <none>Annotations:  <none>API Version:  operator.openshift.io/v1Kind:         NetworkMetadata:  Creation Timestamp:  2023-12-20T15:15:37Z  Generation:          275  Resource Version:    120026  UID:                 278bd491-ac88-4038-887f-d1defc450740Spec:  Cluster Network:    Cidr:         10.128.0.0/14    Host Prefix:  23  Default Network:    Openshift SDN Config:      Enable Unidling:          true      Mode:                     NetworkPolicy      Mtu:                      8951      Vxlan Port:               4789    Type:                       OVNKubernetes  Deploy Kube Proxy:            false  Disable Multi Network:        false  Disable Network Diagnostics:  false  Kube Proxy Config:    Bind Address:      0.0.0.0  Log Level:           Normal  Management State:    Managed  Observed Config:     <nil>  Operator Log Level:  Normal  Service Network:    172.30.0.0/16  Unsupported Config Overrides:  <nil>  Use Multi Network Policy:      falseStatus:  Conditions:    Last Transition Time:  2023-12-20T15:15:37Z    Status:                False    Type:                  ManagementStateDegraded    Last Transition Time:  2023-12-20T16:58:58Z    Message:               Not applying unsafe configuration change: invalid configuration: [cannot change default network type when not doing migration]. Use 'oc edit network.operator.openshift.io cluster' to undo the change.    Reason:                InvalidOperatorConfig    Status:                True    Type:                  Degraded    Last Transition Time:  2023-12-20T15:15:37Z    Status:                True    Type:                  Upgradeable    Last Transition Time:  2023-12-20T16:52:11Z    Status:                False    Type:                  Progressing    Last Transition Time:  2023-12-20T15:15:45Z    Status:                True    Type:                  Available  Ready Replicas:          0  Version:                 4.15.0-0.nightly-2023-12-18-220750Events:                    <none>[weliang@weliang ~]$ oc get clusterversionNAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUSversion   4.15.0-0.nightly-2023-12-18-220750   True        False         84m     Error while reconciling 4.15.0-0.nightly-2023-12-18-220750: the cluster operator network is degraded[weliang@weliang ~]$

Expected results:

Migration success

Additional info:

Get same error message from ROSA and GCP cluster.

https://github.com/openshift/cluster-network-operator/pull/2194

Bug OCPBUGS-18854: Update 4.15 prom-label-proxy image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prom-label-proxy/pull/357

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/prom-label-proxy/pull/357

Bug OCPBUGS-19512: Faster risk cache warming

View the Description View the linked PRs

~~OCPBUGS-5469~~ and backports began prioritizing later target releases, but we still wait 10m between different PromQL evaluations while evaluating conditional update risks. This ticket is tracking work to speed up cache warming, and allows changes that are too invasive to be worth backporting.

Definition of done:

When presented with new risks, the CVO will initially evaluate one PromQL expression every second or so, instead of waiting 10m between different evaluations. Each PromQL expression will still only be evaluated once every hour or so, to avoid excessive load on the PromQL engine.

Acceptance Criteria:

After changing the channel and receiving a new graph conditional risks are evaluated as quickly as possible, ideally less than 500ms per unique risk

https://github.com/openshift/cluster-version-operator/pull/939

Bug OCPBUGS-25236: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-disk-csi-driver/pull/68

Bug OCPBUGS-22473: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3206

Bug OCPBUGS-22471: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openshift-controller-manager/pull/280

Bug OCPBUGS-23255: Baremetal clusters installed with the agent installer are not skipping the first boot if they use FIPS

View the Description View the linked PRs

Description of problem:

When a cluster is using FIPS in an installation with the agent installer, the reboot in the machine-config-daemon-firstboot.service is not skipped.

Since https://issues.redhat.com/browse/MCO-706 the agent installer should be able to skip the firstboot service reboot.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always

Steps to Reproduce:

1. We cause these prow jobs to install a cluster

without fips (HA): periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-baremetal-pxe-ha-agent-ipv4-static-connected-f14

with fips (SNO):  periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-baremetal-sno-agent-ipv4-static-connected-f7


We can find the firstboot service's logs in the must-gather.tar file.

2.
3.

Actual results:

In the machine-config-daemon-firstboot.service logs we can see that the reboot is not skipped when the installation is using fips=true.

You can find the logs in the "additional info" section below.

Expected results:

The firstboot service should skip the reboot in the installation.

Additional info:

This is the machine-config-daemon-firstboot logs for a baremetal HA cluster with fips and installed using agent installer: (FIRST REBOOT NOT SKIPPED)


Nov 14 11:26:59 worker-00 systemd[1]: Starting Machine Config Daemon Firstboot...
Nov 14 11:26:59 worker-00 sh[4182]: sed: can't read /etc/yum.repos.d/*.repo: No such file or directory
Nov 14 11:26:59 worker-00 podman[4183]: W1114 11:26:59.393738       1 daemon.go:1673] Failed to persist NIC names: open /rootfs/etc/systemd/network: no such file or directory
Nov 14 11:26:59 worker-00 podman[4296]: I1114 11:26:59.866300    4348 daemon.go:457] container is rhel8, target is rhel9
Nov 14 11:26:59 worker-00 podman[4296]: I1114 11:26:59.896550    4348 daemon.go:525] Invoking re-exec /run/bin/machine-config-daemon
Nov 14 11:26:59 worker-00 podman[4296]: I1114 11:26:59.955660    4348 update.go:2120] Running: systemctl daemon-reload
Nov 14 11:27:00 worker-00 podman[4296]: I1114 11:27:00.537582    4348 rpm-ostree.go:88] Enabled workaround for bug 2111817
Nov 14 11:27:00 worker-00 podman[4296]: I1114 11:27:00.537944    4348 rpm-ostree.go:263] Linking ostree authfile to /etc/mco/internal-registry-pull-secret.json
Nov 14 11:27:00 worker-00 podman[4296]: I1114 11:27:00.833062    4348 daemon.go:270] Booted osImageURL: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a9bdfdf95023b7aebbbc9d5d335c973832fceb795ed943f365fefea7db646b66 (415.92.202311130854-0) 67df227c04e9306ddcb78331654ecf0ebb2cb1433498f9c12e832c7d5e74c1d9
Nov 14 11:27:00 worker-00 podman[4296]: I1114 11:27:00.833303    4348 rpm-ostree.go:308] Running captured: rpm-ostree --version
Nov 14 11:27:00 worker-00 podman[4296]: I1114 11:27:00.893156    4348 daemon.go:1076] rpm-ostree has container feature
Nov 14 11:27:00 worker-00 podman[4296]: I1114 11:27:00.893582    4348 rpm-ostree.go:308] Running captured: rpm-ostree kargs
Nov 14 11:27:01 worker-00 podman[4296]: I1114 11:27:01.008588    4348 update.go:2157] Adding SIGTERM protection
Nov 14 11:27:01 worker-00 podman[4296]: I1114 11:27:01.008821    4348 update.go:599] Checking Reconcilable for config mco-empty-mc to rendered-worker-ef30fce69107b4fc38dc1020038ebd6a
Nov 14 11:27:01 worker-00 podman[4296]: I1114 11:27:01.009121    4348 update.go:1064] FIPS is configured and enabled
Nov 14 11:27:01 worker-00 podman[4296]: I1114 11:27:01.009345    4348 update.go:2135] Starting update from mco-empty-mc to rendered-worker-ef30fce69107b4fc38dc1020038ebd6a: &{osUpdate:true kargs:true fips:false passwd:false files:false units:false kernelType:false extensions:false}
Nov 14 11:27:01 worker-00 podman[4296]: I1114 11:27:01.055403    4348 update.go:1349] Updating files
Nov 14 11:27:01 worker-00 podman[4296]: I1114 11:27:01.055415    4348 update.go:1412] Deleting stale data
Nov 14 11:27:01 worker-00 podman[4296]: I1114 11:27:01.055419    4348 update.go:1818] updating the permission of the kubeconfig to: 0o600
Nov 14 11:27:01 worker-00 podman[4296]: I1114 11:27:01.055484    4348 update.go:1784] Checking if absent users need to be disconfigured
Nov 14 11:27:01 worker-00 podman[4296]: I1114 11:27:01.055610    4348 update.go:2210] Already in desired image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a9bdfdf95023b7aebbbc9d5d335c973832fceb795ed943f365fefea7db646b66
Nov 14 11:27:01 worker-00 podman[4296]: I1114 11:27:01.055616    4348 update.go:2120] Running: rpm-ostree cleanup -p
Nov 14 11:27:01 worker-00 podman[4296]: Deployments unchanged.
Nov 14 11:27:01 worker-00 podman[4296]: I1114 11:27:01.224788    4348 update.go:2135] Running rpm-ostree [kargs --append=systemd.unified_cgroup_hierarchy=1 --append=cgroup_no_v1="all" --append=psi=1]
Nov 14 11:27:01 worker-00 podman[4296]: I1114 11:27:01.271647    4348 update.go:2120] Running: rpm-ostree kargs --append=systemd.unified_cgroup_hierarchy=1 --append=cgroup_no_v1="all" --append=psi=1
Nov 14 11:27:03 worker-00 podman[4296]: Staging deployment...done
Nov 14 11:27:05 worker-00 podman[4296]: Changes queued for next boot. Run "systemctl reboot" to start a reboot
Nov 14 11:27:05 worker-00 podman[4296]: I1114 11:27:05.081854    4348 update.go:2135] Rebooting node
Nov 14 11:27:05 worker-00 podman[4296]: I1114 11:27:05.127794    4348 update.go:2165] Removing SIGTERM protection
Nov 14 11:27:05 worker-00 podman[4296]: I1114 11:27:05.127853    4348 update.go:2135] initiating reboot: Completing firstboot provisioning to rendered-worker-ef30fce69107b4fc38dc1020038ebd6a
Nov 14 11:27:05 worker-00 podman[4296]: I1114 11:27:05.235062    4348 update.go:2135] reboot successful
Nov 14 11:27:05 worker-00 systemd[1]: machine-config-daemon-firstboot.service: Main process exited, code=killed, status=15/TERM
Nov 14 11:27:05 worker-00 systemd[1]: machine-config-daemon-firstboot.service: Failed with result 'signal'.
Nov 14 11:27:05 worker-00 systemd[1]: Stopped Machine Config Daemon Firstboot.
-- Boot 2f510f83bdb047bb921fc429d67b8e6a --




This is the logs for a baremetal HA cluster without fips and installed using agent installer:  (FIST REBOOT SKIPPED)


Nov 08 14:27:30 worker-00 systemd[1]: Starting Machine Config Daemon Firstboot...
Nov 08 14:27:30 worker-00 sh[4171]: sed: can't read /etc/yum.repos.d/*.repo: No such file or directory
Nov 08 14:27:30 worker-00 podman[4172]: W1108 14:27:30.970986       1 daemon.go:1673] Failed to persist NIC names: open /rootfs/etc/systemd/network: no such file or directory
Nov 08 14:27:31 worker-00 podman[4273]: I1108 14:27:31.172975    4320 daemon.go:457] container is rhel8, target is rhel9
Nov 08 14:27:31 worker-00 podman[4273]: I1108 14:27:31.202238    4320 daemon.go:525] Invoking re-exec /run/bin/machine-config-daemon
Nov 08 14:27:31 worker-00 podman[4273]: I1108 14:27:31.237492    4320 update.go:2120] Running: systemctl daemon-reload
Nov 08 14:27:31 worker-00 podman[4273]: I1108 14:27:31.436217    4320 rpm-ostree.go:88] Enabled workaround for bug 2111817
Nov 08 14:27:31 worker-00 podman[4273]: E1108 14:27:31.436346    4320 rpm-ostree.go:285] Merged secret file could not be validated; defaulting to cluster pull secret <nil>
Nov 08 14:27:31 worker-00 podman[4273]: I1108 14:27:31.436375    4320 rpm-ostree.go:263] Linking ostree authfile to /var/lib/kubelet/config.json
Nov 08 14:27:31 worker-00 podman[4273]: I1108 14:27:31.555415    4320 daemon.go:270] Booted osImageURL: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e03c9248f78a107efb8b12430d46304e8d93981d23fd932e159d518ed675bc92 (415.92.202311061558-0) b8e1dca18619a2e497edf5346d5018615a226da380989ef6720a1a8cdc27adeb
Nov 08 14:27:31 worker-00 podman[4273]: I1108 14:27:31.555920    4320 rpm-ostree.go:308] Running captured: rpm-ostree --version
Nov 08 14:27:31 worker-00 podman[4273]: I1108 14:27:31.571985    4320 daemon.go:1076] rpm-ostree has container feature
Nov 08 14:27:31 worker-00 podman[4273]: I1108 14:27:31.572484    4320 rpm-ostree.go:308] Running captured: rpm-ostree kargs
Nov 08 14:27:31 worker-00 podman[4273]: I1108 14:27:31.600313    4320 update.go:186] No changes from mco-empty-mc to rendered-worker-30da1eef7a5d361fc395f2726c8210d5
Nov 08 14:27:31 worker-00 systemd[1]: Finished Machine Config Daemon Firstboot.

https://github.com/openshift/machine-config-operator/pull/4033

Bug OCPBUGS-19427: Whitespace at the end of URL in ICSP is carried over into resources conf file and results in invalid url errors

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/4064

Bug OCPBUGS-23495: "duplicate port definition" warning message in 4.15 UWM prometheus-operator

View the Description View the linked PRs

Description of problem:

4.15.0-0.nightly-2023-10-06-123200, Prometheus Operator version is 0.68.0, there is "duplicate port definition" warning message in 4.15 prometheus-operator

$ oc logs deployment/prometheus-operator -n openshift-monitoring | grep "duplicate port definition with" -C2
level=info ts=2023-10-08T01:44:40.586511278Z caller=operator.go:655 component=alertmanageroperator key=openshift-monitoring/main msg="sync alertmanager"
level=info ts=2023-10-08T01:44:40.626492507Z caller=operator.go:655 component=alertmanageroperator key=openshift-monitoring/main msg="sync alertmanager"
level=warn ts=2023-10-08T01:44:40.628520232Z caller=klog.go:96 component=k8s_client_runtime func=Warning msg="spec.template.spec.containers[5].ports[0]: duplicate port definition with spec.template.spec.containers[2].ports[0]"
level=info ts=2023-10-08T01:44:40.63072762Z caller=operator.go:1189 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"
level=info ts=2023-10-08T01:44:40.91709494Z caller=operator.go:1189 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"
--
level=info ts=2023-10-08T01:45:19.85277831Z caller=operator.go:655 component=alertmanageroperator key=openshift-monitoring/main msg="sync alertmanager"
level=info ts=2023-10-08T01:45:24.014118091Z caller=operator.go:1189 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"
level=warn ts=2023-10-08T01:45:24.256334754Z caller=klog.go:96 component=k8s_client_runtime func=Warning msg="spec.template.spec.containers[5].ports[0]: duplicate port definition with spec.template.spec.containers[2].ports[0]"
level=info ts=2023-10-08T01:45:24.259230552Z caller=operator.go:1189 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"
level=info ts=2023-10-08T01:45:24.50510448Z caller=operator.go:1189 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"
--
level=info ts=2023-10-08T07:33:33.724893975Z caller=operator.go:1310 component=prometheusoperator key=openshift-monitoring/k8s statefulset=prometheus-k8s shard=0 msg="recreating StatefulSet because the update operation wasn't possible" reason="Forbidden: updates to statefulset spec for fields other than 'replicas', 'ordinals', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden"
level=info ts=2023-10-08T07:33:35.232445429Z caller=operator.go:1189 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"
level=warn ts=2023-10-08T07:33:35.442232343Z caller=klog.go:96 component=k8s_client_runtime func=Warning msg="spec.template.spec.containers[5].ports[0]: duplicate port definition with spec.template.spec.containers[2].ports[0]"
level=info ts=2023-10-08T07:33:35.445827197Z caller=operator.go:1189 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"
level=info ts=2023-10-08T07:33:35.708322936Z caller=operator.go:1189 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"

kube-rbac-proxy-thanos and thanos-sidecar container use the same 10902 port, no functional affect, the warning maybe expected, if so, we could close this bug

$ oc -n openshift-monitoring get sts prometheus-k8s -ojsonpath='{.spec.template.spec.containers[5].ports[0]}' | jq
{
  "containerPort": 10902,
  "name": "thanos-proxy",
  "protocol": "TCP"
}

$ oc -n openshift-monitoring get sts prometheus-k8s -ojsonpath='{.spec.template.spec.containers[2].ports[0]}' | jq
{
  "containerPort": 10902,
  "name": "http",
  "protocol": "TCP"
}

$ oc -n openshift-monitoring get sts prometheus-k8s -ojsonpath='{.spec.template.spec.containers[5].name}' 
kube-rbac-proxy-thanos

$ oc -n openshift-monitoring get sts prometheus-k8s -ojsonpath='{.spec.template.spec.containers[2].name}' 
thanos-sidecar

checked in 4.14, prometheus-operator versio is 0.67.1 no such issue

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.nightly-2023-10-06-234925   True        False         3h33m   Cluster version is 4.14.0-0.nightly-2023-10-06-234925

$ oc logs deployment/prometheus-operator -n openshift-monitoring | grep "duplicate port definition with" -C2
no result

$ oc -n openshift-monitoring get sts prometheus-k8s -ojsonpath='{.spec.template.spec.containers[5].ports[0]}' | jq
{
  "containerPort": 10902,
  "name": "thanos-proxy",
  "protocol": "TCP"
}

$ oc -n openshift-monitoring get sts prometheus-k8s -ojsonpath='{.spec.template.spec.containers[2].ports[0]}' | jq
{
  "containerPort": 10902,
  "name": "http",
  "protocol": "TCP"
}

$ oc -n openshift-monitoring get sts prometheus-k8s -ojsonpath='{.spec.template.spec.containers[5].name}'
kube-rbac-proxy-thanos

$ oc -n openshift-monitoring get sts prometheus-k8s -ojsonpath='{.spec.template.spec.containers[2].name}'
thanos-sidecar

Version-Release number of selected component (if applicable):

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.15.0-0.nightly-2023-10-06-123200   True        False         7h1m    Cluster version is 4.15.0-0.nightly-2023-10-06-123200

How reproducible:

always in 4.15

Steps to Reproduce:

1. check prometheus-operator logs

Actual results:

"duplicate port definition" warning message in 4.15 prometheus-operator

Expected results:

Additional info:

we could close this bug, since it seems it's expected

https://github.com/openshift/cluster-monitoring-operator/pull/2164

Bug OCPBUGS-13669: Azure-file-CSI-Driver should not be installed on Azure Stack Hub

View the Description View the linked PRs

Description of problem:

Azure Stack Hub doesn't support Azure-file yet (from https://learn.microsoft.com/en-us/azure-stack/user/azure-stack-acs-differences?view=azs-2206), so we should not install Azure-file-CSI-Driver on it.

$ oc get infrastructures cluster -o json | jq .status.platformStatus.azure
{
  "armEndpoint": "https://management.mtcazs.wwtatc.com",
  "cloudName": "AzureStackCloud",
  "networkResourceGroupName": "wduan-0516b-ash-rs7gh-rg",
  "resourceGroupName": "wduan-0516b-ash-rs7gh-rg"
}
$ oc get clustercsidrivers file.csi.azure.com
NAME                 AGE
file.csi.azure.com   45m
$ oc get sc azurefile-csi
NAME            PROVISIONER          RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
azurefile-csi   file.csi.azure.com   Delete          Immediate           true                   47m
$ oc describe pvc mydep-pvc-02
  Warning  ProvisioningFailed  <invalid>  file.csi.azure.com_wduan-0516b-ash-rs7gh-master-1_19c3f203-70a7-4d7f-afcc-22665adff5fe  failed to provision volume with StorageClass "azurefile-csi": rpc error: code = Internal desc = failed to ensure storage account: failed to create storage account f0f49c11984fb413a958286, error: &{false 400 0001-01-01 00:00:00 +0000 UTC {
  "code": "StorageAccountInvalidKind",
  "message": "The requested storage account kind is invalid in this location.",
  "target": "StorageAccount"
}}

Version-Release number of selected component (if applicable):
4.13.0-0.nightly-2023-05-11-225357

How reproducible:
Always

Steps to Reproduce:
See Description

Actual results:
Azure-file-CSI-Driver is installed on Azure Stack Hub

Expected results:
Azure-file-CSI-Driver should not be installed on Azure Stack Hub

https://github.com/openshift/cluster-storage-operator/pull/395

Bug OCPBUGS-14819: CA bundles for hosted cluster monitoring not created

View the Description View the linked PRs

Description of problem:

alertmanager-trusted-ca-bundle, prometheus-trusted-ca-bundle, telemeter-trusted-ca-bundle, thanos-querier-trusted-ca-bundle are empty on the hosted cluster. This results in CMO not creating the prometheus CR, resulting in no prometheus pods. 

This issue prevents us from monitoring the hosted cluster.

Version-Release number of selected component (if applicable):

4.13.z

How reproducible:

Rare: Found only one occurence for now.

Steps to Reproduce:

1.
2.
3.

Actual results:

Certs are not created, prometheus doesn't create prometheus pods

Expected results:

Certs are created and CMO can create prometheus pods

Additional info:

Linked Must Gather of the MC, inspect of the openshift-monitoring DP namespace

Bug OCPBUGS-18783: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/791

Bug OCPBUGS-19278: Update 4.15 ovn-kubernetes-base image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ovn-kubernetes/pull/1882

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ovn-kubernetes/pull/1882

Bug OCPBUGS-18498: spec.containers.image is empty when use 'oc new-app' created deploy when build/deploymentconfig are not installed

View the Description View the linked PRs

Description of problem:

If not installed capability operator build and deploymentconfig, when use `oc new-app registry.redhat.io/<namespace>/<image>:<tag>` , the created deployment emptied spec.containers[0].image. The deploy will fail to start pod.

Version-Release number of selected component (if applicable):

oc version
Client Version: 4.14.0-0.nightly-2023-08-22-221456
Kustomize Version: v5.0.1
Server Version: 4.14.0-0.nightly-2023-09-02-132842
Kubernetes Version: v1.27.4+2c83a9f

How reproducible:

Always

Steps to Reproduce:

1. Installed cluster without build/deploymentconfig function
Set "baselineCapabilitySet: None" in install-config
2.Create a deploy using 'new-app' cmd
oc new-app registry.redhat.io/ubi8/httpd-24:latest
3.

Actual results:

2.
$oc new-app registry.redhat.io/ubi8/httpd-24:latest
--> Found container image c412709 (11 days old) from registry.redhat.io for "registry.redhat.io/ubi8/httpd-24:latest"    Apache httpd 2.4
    ----------------
    Apache httpd 2.4 available as container, is a powerful, efficient, and extensible web server. Apache supports a variety of features, many implemented as compiled modules which extend the core functionality. These can range from server-side programming language support to authentication schemes. Virtual hosting allows one Apache installation to serve many different Web sites.    Tags: builder, httpd, httpd-24    * An image stream tag will be created as "httpd-24:latest" that will track this image--> Creating resources ...
    imagestream.image.openshift.io "httpd-24" created
    deployment.apps "httpd-24" created
    service "httpd-24" created
--> Success
    Application is not exposed. You can expose services to the outside world by executing one or more of the commands below:
     'oc expose service/httpd-24'
    Run 'oc status' to view your app

3. oc get deploy -o yaml
 apiVersion: v1
items:
- apiVersion: apps/v1
  kind: Deployment
  metadata:
    annotations:
      deployment.kubernetes.io/revision: "1"
      image.openshift.io/triggers: '[{"from":{"kind":"ImageStreamTag","name":"httpd-24:latest"},"fieldPath":"spec.template.spec.containers[?(@.name==\"httpd-24\")].image"}]'
      openshift.io/generated-by: OpenShiftNewApp
    creationTimestamp: "2023-09-04T07:44:01Z"
    generation: 1
    labels:
      app: httpd-24
      app.kubernetes.io/component: httpd-24
      app.kubernetes.io/instance: httpd-24
    name: httpd-24
    namespace: wxg
    resourceVersion: "115441"
    uid: 909d0c4e-180c-4f88-8fb5-93c927839903
  spec:
    progressDeadlineSeconds: 600
    replicas: 1
    revisionHistoryLimit: 10
    selector:
      matchLabels:
        deployment: httpd-24
    strategy:
      rollingUpdate:
        maxSurge: 25%
        maxUnavailable: 25%
      type: RollingUpdate
    template:
      metadata:
        annotations:
          openshift.io/generated-by: OpenShiftNewApp
        creationTimestamp: null
        labels:
          deployment: httpd-24
      spec:
        containers:
        - image: ' '
          imagePullPolicy: IfNotPresent
          name: httpd-24
          ports:
          - containerPort: 8080
            protocol: TCP
          - containerPort: 8443
            protocol: TCP
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
        dnsPolicy: ClusterFirst
        restartPolicy: Always
        schedulerName: default-scheduler
        securityContext: {}
        terminationGracePeriodSeconds: 30
  status:
    conditions:
    - lastTransitionTime: "2023-09-04T07:44:01Z"
      lastUpdateTime: "2023-09-04T07:44:01Z"
      message: Created new replica set "httpd-24-7f6b55cc85"
      reason: NewReplicaSetCreated
      status: "True"
      type: Progressing
    - lastTransitionTime: "2023-09-04T07:44:01Z"
      lastUpdateTime: "2023-09-04T07:44:01Z"
      message: Deployment does not have minimum availability.
      reason: MinimumReplicasUnavailable
      status: "False"
      type: Available
    - lastTransitionTime: "2023-09-04T07:44:01Z"
      lastUpdateTime: "2023-09-04T07:44:01Z"
      message: 'Pod "httpd-24-7f6b55cc85-pvvgt" is invalid: spec.containers[0].image:
        Invalid value: " ": must not have leading or trailing whitespace'
      reason: FailedCreate
      status: "True"
      type: ReplicaFailure
    observedGeneration: 1
    unavailableReplicas: 1
kind: List
metadata:

Expected results:

Should set spec.containers[0].image to registry.redhat.io/ubi8/httpd-24:latest

Additional info:

Bug OCPBUGS-19107: Update 4.15 ose-egress-http-proxy image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/images/pull/150

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/images/pull/150

Bug OCPBUGS-19674: Wrong port reported in HostedCluster .status.controlPlaneEndpoint.port

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

When using a route to expose the API server endpoint in a HostedCluster, the .status.controlPlaneEndpoint.port is reported as 6443 (the internal port) instead of 443 which is the port that is externally exposed via the route.

How reproducible:

Always

Steps to Reproduce:

1. Create a HostedCluster with a custom dns name using route as the strategy
3. Inspect .status.controlPlaneEndpoint

Actual results:

It has 6443 as the port

Expected results:

It has 443 as the port

Additional info:

https://github.com/openshift/hypershift/pull/3037

Bug OCPBUGS-25707: Oh no! Something went wrong" in Topology -> Observese Tab

View the Description View the linked PRs

This is a clone of issue OCPBUGS-25441. The following is the description of the original issue:
—
Description of problem:

    Oh no! Something went wrong" in Topology -> Observese Tab

Version-Release number of selected component (if applicable):

    4.15.0-0.nightly-2023-12-14-115151

How reproducible:

    Always

Steps to Reproduce:

    1.Navigate to Topology -> click one deployment and go to Observer Tab
    2.
    3.

Actual results:

    The page crushed
ErrorDescription:Component trace:Copy to clipboardat te (https://console-openshift-console.apps.qe-uidaily-1215.qe.devcluster.openshift.com/static/vendor-plugins-shared~main-chunk-b3bd2b20c770a4e73b50.min.js:31:9773)
    at j (https://console-openshift-console.apps.qe-uidaily-1215.qe.devcluster.openshift.com/static/vendor-plugins-shared~main-chunk-b3bd2b20c770a4e73b50.min.js:12:3324)
    at div
    at s (https://console-openshift-console.apps.qe-uidaily-1215.qe.devcluster.openshift.com/static/vendor-patternfly-5~main-chunk-c9c3c11a060d045a85da.min.js:60:70124)
    at div
    at g (https://console-openshift-console.apps.qe-uidaily-1215.qe.devcluster.openshift.com/static/vendor-patternfly-5~main-chunk-c9c3c11a060d045a85da.min.js:6:11163)
    at div
    at d (https://console-openshift-console.apps.qe-uidaily-1215.qe.devcluster.openshift.com/static/vendor-patternfly-5~main-chunk-c9c3c11a060d045a85da.min.js:1:174472)
    at t.a (https://console-openshift-console.apps.qe-uidaily-1215.qe.devcluster.openshift.com/static/dev-console/code-refs/topology-chunk-769d28af48dd4b29136f.min.js:1:487478)
    at t.a (https://console-openshift-console.apps.qe-uidaily-1215.qe.devcluster.openshift.com/static/dev-console/code-refs/topology-chunk-769d28af48dd4b29136f.min.js:1:486390)
    at div
    at l (https://console-openshift-console.apps.qe-uidaily-1215.qe.devcluster.openshift.com/static/vendor-patternfly-5~main-chunk-c9c3c11a060d045a85da.min.js:60:106304)
    at div

Expected results:
{code:none}
    not crush

Additional info:

https://github.com/openshift/console/pull/13462

Bug OCPBUGS-19172: Update 4.15 ose-azure-disk-csi-driver-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/azure-disk-csi-driver-operator/pull/98

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-24121: Update 4.15 operator-registry-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/operator-framework-olm/pull/621

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/operator-framework-olm/pull/621

Bug OCPBUGS-10652: hybrid overlay VXLAN traffic should skip conntrack like GENEVE does

View the Description View the linked PRs

All our tunnel traffic, whether GENEVE or VXLAN, should skip conntrack in the host network namespace because it's pointless to track it. It's UDP and it's point-to-point; there are no connections to care about.

We already skip the GENEVE traffic in OVN-K and the VXLAN traffic in SDN, but we aren't skipping the VXLAN traffic that Hybrid Overlay and ICNIv1 generate.

CNO's ovnkube-node YAML should add a couple lines to, if Hybrid Overlay is enabled, -j NOTRACK for .OVNHybridOverlayVXLANPort. Note that .OVNHybridOverlayVXLANPort will be empty if the default VXLAN port is used, so we'd need a bit of if/else logical to -j NOTRACK the default port if .OVNHybridOverlayVXLANPort is empty.

https://github.com/openshift/cluster-network-operator/pull/1819

Bug OCPBUGS-16783: Chore: Update OWNERS and OWNERS_ALIASES in CSI driver and operator repos

View the Description View the linked PRs

Sanitize OWNERS/OWNER_ALIASES in all CSI driver and operator repos.

For driver repos:

1) OWNERS must have `component`:

component: "Storage / Kubernetes External Components"

2) OWNER_ALIASES must have all team members of Storage team.

For operator repos:

1) OWNERS must have:

all team members of Storage team as `approvers`
`component`:
```
component: "Storage / Operators"
```

https://github.com/openshift/cloud-provider-openstack/pull/207

Bug OCPBUGS-18720: Catalog pods in hypershift control plane in ImagePullBackOff

View the Description View the linked PRs

Description of problem:

Catalog pods in hypershift control plane in ImagePullBackOff

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1. Create a cluster in 4.14 HO + OCP 4.14.0-0.ci-2023-09-07-120503
2. Check controlplane pods, catalog pods in control plane namespace in ImagePullBackOff
3.

Actual results:

jiezhao-mac:hypershift jiezhao$ oc get pods -n clusters-jie-test | grep catalog catalog-operator-64fd787d9c-98wx5                     2/2     Running            0          2m43s 
certified-operators-catalog-7766fc5b8-4s66z           0/1     ImagePullBackOff   0          2m43s 
community-operators-catalog-847cdbff6-wsf74           0/1     ImagePullBackOff   0          2m43s 
redhat-marketplace-catalog-fccc6bbb5-2d5x4            0/1     ImagePullBackOff   0          2m43s 
redhat-operators-catalog-86b6f66d5d-mpdsc             0/1     ImagePullBackOff   0          2m43s

Events:   Type     Reason          Age                 From               Message   ----     ------          ----                ----               -------   Normal   Scheduled       65m                 default-scheduler  Successfully assigned clusters-jie-test/certified-operators-catalog-7766fc5b8-4s66z to ip-10-0-64-135.us-east-2.compute.internal   Normal   AddedInterface  65m                 multus             Add eth0 [10.128.2.141/23] from openshift-sdn   Normal   Pulling         63m (x4 over 65m)   kubelet            Pulling image "from:imagestream"   Warning  Failed          63m (x4 over 65m)   kubelet            Failed to pull image "from:imagestream": rpc error: code = Unknown desc = reading manifest imagestream in docker.io/library/from: requested access to the resource is denied   Warning  Failed          63m (x4 over 65m)   kubelet            Error: ErrImagePull   Warning  Failed          63m (x6 over 65m)   kubelet            Error: ImagePullBackOff   Normal   BackOff         9s (x280 over 65m)  kubelet            Back-off pulling image "from:imagestream" jiezhao-mac:hypershift jiezhao$

Expected results:

catalog pods are running

Additional info:

slack:
https://redhat-internal.slack.com/archives/C01C8502FMM/p1694170060144859

https://github.com/openshift/hypershift/pull/3001

Bug OCPBUGS-19059: baremetal 4.14.0-rc.0 ipv6 sno cluster, no Observe menu on admin console, monitoring-plugin is failed

View the Description View the linked PRs

Description of problem:

Failed to get a valid plugin manifest from /api/plugins/monitoring-plugin/
r: Bad Gateway

checked console logs, 9443: connect: connection refused

$ oc -n openshift-console logs console-6869f8f4f4-56mbj
...
E0915 12:50:15.498589       1 handlers.go:164] GET request for "monitoring-plugin" plugin failed: Get "https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json": dial tcp [fd02::f735]:9443: connect: connection refused
2023/09/15 12:50:15 http: panic serving [fd01:0:0:1::2]:39156: runtime error: invalid memory address or nil pointer dereference
goroutine 183760 [running]:
net/http.(*conn).serve.func1()
    /usr/lib/golang/src/net/http/server.go:1854 +0xbf
panic({0x3259140, 0x4fcc150})
    /usr/lib/golang/src/runtime/panic.go:890 +0x263
github.com/openshift/console/pkg/plugins.(*PluginsHandler).proxyPluginRequest(0xc0003b5760, 0x2?, {0xc0009bc7d1, 0x11}, {0x3a41fa0, 0xc0002f6c40}, 0xb?)
    /go/src/github.com/openshift/console/pkg/plugins/handlers.go:165 +0x582
github.com/openshift/console/pkg/plugins.(*PluginsHandler).HandlePluginAssets(0xaa00000000000010?, {0x3a41fa0, 0xc0002f6c40}, 0xc0001f7500)
    /go/src/github.com/openshift/console/pkg/plugins/handlers.go:147 +0x26d
github.com/openshift/console/pkg/server.(*Server).HTTPHandler.func23({0x3a41fa0?, 0xc0002f6c40?}, 0x7?)
    /go/src/github.com/openshift/console/pkg/server/server.go:604 +0x33
net/http.HandlerFunc.ServeHTTP(...)
    /usr/lib/golang/src/net/http/server.go:2122
github.com/openshift/console/pkg/server.authMiddleware.func1(0xc0001f7500?, {0x3a41fa0?, 0xc0002f6c40?}, 0xd?)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:25 +0x31
github.com/openshift/console/pkg/server.authMiddlewareWithUser.func1({0x3a41fa0, 0xc0002f6c40}, 0xc0001f7500)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:81 +0x46c
net/http.HandlerFunc.ServeHTTP(0x5120938?, {0x3a41fa0?, 0xc0002f6c40?}, 0x7ffb6ea27f18?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.StripPrefix.func1({0x3a41fa0, 0xc0002f6c40}, 0xc0001f7400)
    /usr/lib/golang/src/net/http/server.go:2165 +0x332
net/http.HandlerFunc.ServeHTTP(0xc001102c00?, {0x3a41fa0?, 0xc0002f6c40?}, 0xc000655a00?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.(*ServeMux).ServeHTTP(0x34025e0?, {0x3a41fa0, 0xc0002f6c40}, 0xc0001f7400)
    /usr/lib/golang/src/net/http/server.go:2500 +0x149
github.com/openshift/console/pkg/server.securityHeadersMiddleware.func1({0x3a41fa0, 0xc0002f6c40}, 0x3305040?)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:128 +0x3af
net/http.HandlerFunc.ServeHTTP(0x0?, {0x3a41fa0?, 0xc0002f6c40?}, 0x11db52e?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.serverHandler.ServeHTTP({0xc0008201e0?}, {0x3a41fa0, 0xc0002f6c40}, 0xc0001f7400)
    /usr/lib/golang/src/net/http/server.go:2936 +0x316
net/http.(*conn).serve(0xc0009b4120, {0x3a43e70, 0xc001223500})
    /usr/lib/golang/src/net/http/server.go:1995 +0x612
created by net/http.(*Server).Serve
    /usr/lib/golang/src/net/http/server.go:3089 +0x5ed
I0915 12:50:24.267777       1 handlers.go:118] User settings ConfigMap "user-settings-4b4c2f4d-159c-4358-bba3-3d87f113cd9b" already exist, will return existing data.
I0915 12:50:24.267813       1 handlers.go:118] User settings ConfigMap "user-settings-4b4c2f4d-159c-4358-bba3-3d87f113cd9b" already exist, will return existing data.
E0915 12:50:30.155515       1 handlers.go:164] GET request for "monitoring-plugin" plugin failed: Get "https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json": dial tcp [fd02::f735]:9443: connect: connection refused
2023/09/15 12:50:30 http: panic serving [fd01:0:0:1::2]:42990: runtime error: invalid memory address or nil pointer dereference

9443 port is Connection refused

$ oc -n openshift-monitoring get pod -o wide
NAME                                                     READY   STATUS    RESTARTS   AGE     IP                  NODE    NOMINATED NODE   READINESS GATES
alertmanager-main-0                                      6/6     Running   6          3d22h   fd01:0:0:1::564     sno-2   <none>           <none>
cluster-monitoring-operator-6cb777d488-nnpmx             1/1     Running   4          7d16h   fd01:0:0:1::12      sno-2   <none>           <none>
kube-state-metrics-dc5f769bc-p97m7                       3/3     Running   12         7d16h   fd01:0:0:1::3b      sno-2   <none>           <none>
monitoring-plugin-85bfb98485-d4g5x                       1/1     Running   4          7d16h   fd01:0:0:1::55      sno-2   <none>           <none>
node-exporter-ndnnj                                      2/2     Running   8          7d16h   2620:52:0:165::41   sno-2   <none>           <none>
openshift-state-metrics-78df59b4d5-j6r5s                 3/3     Running   12         7d16h   fd01:0:0:1::3a      sno-2   <none>           <none>
prometheus-adapter-6f86f7d8f5-ttflf                      1/1     Running   0          4h23m   fd01:0:0:1::b10c    sno-2   <none>           <none>
prometheus-k8s-0                                         6/6     Running   6          3d22h   fd01:0:0:1::566     sno-2   <none>           <none>
prometheus-operator-7c94855989-csts2                     2/2     Running   8          7d16h   fd01:0:0:1::39      sno-2   <none>           <none>
prometheus-operator-admission-webhook-7bb64b88cd-bvq8m   1/1     Running   4          7d16h   fd01:0:0:1::37      sno-2   <none>           <none>
thanos-querier-5bbb764599-vlztq                          6/6     Running   6          3d22h   fd01:0:0:1::56a     sno-2   <none>           <none>

$  oc -n openshift-monitoring get svc monitoring-plugin
NAME                TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
monitoring-plugin   ClusterIP   fd02::f735   <none>        9443/TCP   7d16h


$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -v 'https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json' | jq
*   Trying fd02::f735...
* TCP_NODELAY set
* connect to fd02::f735 port 9443 failed: Connection refused
* Failed to connect to monitoring-plugin.openshift-monitoring.svc.cluster.local port 9443: Connection refused
* Closing connection 0
curl: (7) Failed to connect to monitoring-plugin.openshift-monitoring.svc.cluster.local port 9443: Connection refused
command terminated with exit code 7

no such issue in other 4.14.0-rc.0 ipv4 cluster, but issue reproduced on other 4.14.0-rc.0 ipv6 cluster.
4.14.0-rc.0 ipv4 cluster,

$ oc get clusterversion
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-rc.0   True        False         20m     Cluster version is 4.14.0-rc.0

$ oc -n openshift-monitoring get pod -o wide | grep monitoring-plugin
monitoring-plugin-85bfb98485-nh428                       1/1     Running   0          4m      10.128.0.107   ci-ln-pby4bj2-72292-l5q8v-master-0   <none>           <none>

$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k  'https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json' | jq
...
{
  "name": "monitoring-plugin",
  "version": "1.0.0",
  "displayName": "OpenShift console monitoring plugin",
  "description": "This plugin adds the monitoring UI to the OpenShift web console",
  "dependencies": {
    "@console/pluginAPI": "*"
  },
  "extensions": [
    {
      "type": "console.page/route",
      "properties": {
        "exact": true,
        "path": "/monitoring",
        "component": {
          "$codeRef": "MonitoringUI"
        }
      }
    },
...

meet issue "9443: Connection refused" in 4.14.0-rc.0 ipv6 cluster(launched cluster-bot cluster: launch 4.14.0-rc.0 metal,ipv6) and login console

$ oc get clusterversion
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-rc.0   True        False         44m     Cluster version is 4.14.0-rc.0
$ oc -n openshift-monitoring get pod -o wide | grep monitoring-plugin
monitoring-plugin-bd6ffdb5d-b5csk                        1/1     Running   0          53m   fd01:0:0:4::b             worker-0.ostest.test.metalkube.org   <none>           <none>
monitoring-plugin-bd6ffdb5d-vhtpf                        1/1     Running   0          53m   fd01:0:0:5::9             worker-2.ostest.test.metalkube.org   <none>           <none>
$ oc -n openshift-monitoring get svc monitoring-plugin
NAME                TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
monitoring-plugin   ClusterIP   fd02::402d   <none>        9443/TCP   59m

$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -v 'https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json' | jq
*   Trying fd02::402d...
* TCP_NODELAY set
* connect to fd02::402d port 9443 failed: Connection refused
* Failed to connect to monitoring-plugin.openshift-monitoring.svc.cluster.local port 9443: Connection refused
* Closing connection 0
curl: (7) Failed to connect to monitoring-plugin.openshift-monitoring.svc.cluster.local port 9443: Connection refused
command terminated with exit code 7$ oc -n openshift-console get pod | grep console
console-5cffbc7964-7ljft     1/1     Running   0          56m
console-5cffbc7964-d864q     1/1     Running   0          56m$ oc -n openshift-console logs console-5cffbc7964-7ljft
...
E0916 14:34:16.330117       1 handlers.go:164] GET request for "monitoring-plugin" plugin failed: Get "https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json": dial tcp [fd02::402d]:9443: connect: connection refused
2023/09/16 14:34:16 http: panic serving [fd01:0:0:4::2]:37680: runtime error: invalid memory address or nil pointer dereference
goroutine 3985 [running]:
net/http.(*conn).serve.func1()
    /usr/lib/golang/src/net/http/server.go:1854 +0xbf
panic({0x3259140, 0x4fcc150})
    /usr/lib/golang/src/runtime/panic.go:890 +0x263
github.com/openshift/console/pkg/plugins.(*PluginsHandler).proxyPluginRequest(0xc0008f6780, 0x2?, {0xc000665211, 0x11}, {0x3a41fa0, 0xc0009221c0}, 0xb?)
    /go/src/github.com/openshift/console/pkg/plugins/handlers.go:165 +0x582
github.com/openshift/console/pkg/plugins.(*PluginsHandler).HandlePluginAssets(0xfe00000000000010?, {0x3a41fa0, 0xc0009221c0}, 0xc000d8d600)
    /go/src/github.com/openshift/console/pkg/plugins/handlers.go:147 +0x26d
github.com/openshift/console/pkg/server.(*Server).HTTPHandler.func23({0x3a41fa0?, 0xc0009221c0?}, 0x7?)
    /go/src/github.com/openshift/console/pkg/server/server.go:604 +0x33
net/http.HandlerFunc.ServeHTTP(...)
    /usr/lib/golang/src/net/http/server.go:2122
github.com/openshift/console/pkg/server.authMiddleware.func1(0xc000d8d600?, {0x3a41fa0?, 0xc0009221c0?}, 0xd?)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:25 +0x31
github.com/openshift/console/pkg/server.authMiddlewareWithUser.func1({0x3a41fa0, 0xc0009221c0}, 0xc000d8d600)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:81 +0x46c
net/http.HandlerFunc.ServeHTTP(0xc000653830?, {0x3a41fa0?, 0xc0009221c0?}, 0x7f824506bf18?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.StripPrefix.func1({0x3a41fa0, 0xc0009221c0}, 0xc000d8d500)
    /usr/lib/golang/src/net/http/server.go:2165 +0x332
net/http.HandlerFunc.ServeHTTP(0xc00007e800?, {0x3a41fa0?, 0xc0009221c0?}, 0xc000b2da00?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.(*ServeMux).ServeHTTP(0x34025e0?, {0x3a41fa0, 0xc0009221c0}, 0xc000d8d500)
    /usr/lib/golang/src/net/http/server.go:2500 +0x149
github.com/openshift/console/pkg/server.securityHeadersMiddleware.func1({0x3a41fa0, 0xc0009221c0}, 0x3305040?)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:128 +0x3af
net/http.HandlerFunc.ServeHTTP(0x0?, {0x3a41fa0?, 0xc0009221c0?}, 0x11db52e?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.serverHandler.ServeHTTP({0xc000db9b00?}, {0x3a41fa0, 0xc0009221c0}, 0xc000d8d500)
    /usr/lib/golang/src/net/http/server.go:2936 +0x316
net/http.(*conn).serve(0xc000653680, {0x3a43e70, 0xc000676f30})
    /usr/lib/golang/src/net/http/server.go:1995 +0x612
created by net/http.(*Server).Serve
    /usr/lib/golang/src/net/http/server.go:3089 +0x5ed

Version-Release number of selected component (if applicable):

baremetal 4.14.0-rc.0 ipv6 sno cluster,
$ token=`oc create token prometheus-k8s -n openshift-monitoring`
$ $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?' --data-urlencode 'query=virt_platform'  | jq
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "virt_platform",
          "baseboard_manufacturer": "Dell Inc.",
          "baseboard_product_name": "01J4WF",
          "bios_vendor": "Dell Inc.",
          "bios_version": "1.10.2",
          "container": "kube-rbac-proxy",
          "endpoint": "https",
          "instance": "sno-2",
          "job": "node-exporter",
          "namespace": "openshift-monitoring",
          "pod": "node-exporter-ndnnj",
          "prometheus": "openshift-monitoring/k8s",
          "service": "node-exporter",
          "system_manufacturer": "Dell Inc.",
          "system_product_name": "PowerEdge R750",
          "system_version": "Not Specified",
          "type": "none"
        },
        "value": [
          1694785092.664,
          "1"
        ]
      }
    ]
  }
}

How reproducible:

ipv6 cluster

Steps to Reproduce:

1. see the description
2.
3.

Actual results:

no Observe menu on admin console, monitoring-plugin is failed

Expected results:

no error

https://github.com/openshift/cluster-monitoring-operator/pull/2090

Task MGMT-16011: Reduce agent image size

View the Description View the linked PRs

The agent container image is currently ~770MB. On slow networks, this can take a long time to download, and users don't know why their host isn't being discovered.

Some suggestions from Omer Tuchfeld:

Change all step binaries to a single binary that inspects argv[0] to determine how it should behave, the rest being symlinks to the one binary (hyperkube / busybox-style)
Strip debug information from the agent binariesy
Remove nmap as a dependency, it's probably overkill for our purposes
Or at least delete the nmap cracklib directory
Remove /usr/share/doc
We don't need the entire /usr/share/misc/magic database, stop using file and use a simpler detection for MBR partition
Remove X11
Remove licenses (or maybe compress them if it's legally problematic)
Remove man
Remove grub , this is a container, it doesn't boot
I'm not sure where all the ceph stuff is coming from
Remove Python - where are we even using Python in the agent?
Look into the libicudata and libmozjs things, I'm not sure we need them (or what they are)

https://github.com/openshift/assisted-installer-agent/pull/617

Bug OCPBUGS-23083: Cluster Network Operator needs additional RBAC permission to deploy network-node-identity when Calico is the network type

View the Description View the linked PRs

Description of problem:

When the network type is Calico for a hosted cluster, the rbac policies that are laid down for CNO do not include permissions to deploy network-node-identity

Version-Release number of selected component (if applicable):

How reproducible: IBM Satellite environment

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3172

Bug OCPBUGS-23432: OCP installation its failing because VIP is not being allocated to the bootstrap node

View the Description View the linked PRs

Description of problem:

OCPv4.14.1 installation its failing because VIP is not being allocated to the bootstrap node

Version-Release number of selected component (if applicable):

OCPv4.14.1

How reproducible:

100% --> https://access.redhat.com/support/cases/#/case/03668010

Steps to Reproduce:

1.
2.
3.

Actual results:

https://access.redhat.com/support/cases/#/case/03668010/discussion?commentId=a0a6R00000Vmdf3QAB

Expected results:

OCP installation to end sucessfully

Additional info:

In the comment https://access.redhat.com/support/cases/#/case/03668010/discussion?commentId=a0a6R00000Vmdf3QAB are described the current state and issue. If additional logs are required I can arrange for this.

Bug OCPBUGS-23855: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-shared-resource-operator/pull/93

Bug OCPBUGS-21640: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-aws/pull/85

Bug OCPBUGS-19114: Update 4.15 csi-provisioner image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-provisioner/pull/69

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-provisioner/pull/69

Bug OCPBUGS-24820: Update 4.16 ose-baremetal-installer-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/installer/pull/7817

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/installer/pull/7817

Bug OCPBUGS-23764: After PatternFly5 update: Form/YAML switchers are missaligned

View the Description View the linked PRs

Issue 29 from https://docs.google.com/spreadsheets/d/1TR3ENY-GE_LQL9F-xH6NahtRHdu6IrYGhLqDDA8y0EI/edit#gid=1035185624

Form View and Yaml view switches are aligned horizontally before, now it is vertical

This happens at least on

Helm form

Screenshot: https://drive.google.com/file/d/1nzFHCeorlVIMbwlnjzEc1fCW0GXQa1KT/view

https://github.com/openshift/console/pull/13380

Bug OCPBUGS-27261: Environment file /etc/kubernetes/node.env is overwritten after a node restart

View the Description View the linked PRs

Description of problem:

    Environment file /etc/kubernetes/node.env is overwritten after node restart. 

There is a type in https://github.com/openshift/machine-config-operator/blob/master/templates/common/aws/files/usr-local-bin-aws-kubelet-nodename.yaml where variable should be changed to NODEENV wherever NODENV is found.

Version-Release number of selected component (if applicable):

How reproducible:

  Easy

Steps to Reproduce:

    1. Change contents of /etc/kubernetes/node.env
    2. Restart node
    3. Notice changes are lost

Actual results:

Expected results:

     /etc/kubernetes/node.env should not be changed after restart of a node

Additional info:

https://github.com/openshift/machine-config-operator/pull/4126

Bug OCPBUGS-18996: "Create StorageClass" form breaks when a dynamic provisioner is selected

View the Description View the linked PRs

Description of problem:

Please check: https://issues.redhat.com/browse/OCPBUGS-18702?focusedId=23021716&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-23021716 for more details.

https://drive.google.com/drive/folders/14aSJs-lO6HC-2xYFlOTJtCZIQg3ekE85?usp=sharing (plz check recording "sc_form_typeerror.mp4").

Issues:
1. TypeError mentioned above.
2. Default params added by an extension are not getting added to the created StorageClass.
3. Validation for parameters added by an extension in not working correctly as well.
4. The Provisioner child details will be stuck once user selected 'openshift-storage.cephfs.csi.ceph.com'.

Version-Release number of selected component (if applicable):

4.14 (OCP)

How reproducible:

Steps to Reproduce:

1. Install ODF operator.
2. Create StorageSystem (once dynamic plugin is loaded).
3. Wait for a while for ODF related StorageClasses gets created.
4. Once they are created, go to "Create StorageSystem" form.
5. Switch to provisioners (rbd.csi.ceph) added by ODF dynamic plugin.

Actual results:

Page breaks with an error.

Expected results:

Page should not break.
And functionality should be how it was acting before the refactoring introduced by PR: https://github.com/openshift/console/pull/13036

Additional info:

Stack trace:
Caught error in a child component: TypeError: Cannot read properties of undefined (reading 'parameters')
    at allRequiredFieldsFilled (storage-class-form.tsx:204:1)
    at validateForm (storage-class-form.tsx:235:1)
    at storage-class-form.tsx:262:1
    at invokePassiveEffectCreate (react-dom.development.js:23487:1)
    at HTMLUnknownElement.callCallback (react-dom.development.js:3945:1)
    at Object.invokeGuardedCallbackDev (react-dom.development.js:3994:1)
    at invokeGuardedCallback (react-dom.development.js:4056:1)
    at flushPassiveEffectsImpl (react-dom.development.js:23574:1)
    at unstable_runWithPriority (scheduler.development.js:646:1)
    at runWithPriority$1 (react-dom.development.js:11276:1) {componentStack: '\n    at StorageClassFormInner (http://localhost:90...c03030668ef271da51f.js:491534:20)\n    at Suspense'}

https://github.com/openshift/console/pull/13153

Bug OCPBUGS-19391: CVO hotloops on ClusterRoleBinding cluster-baremetal-operator and ConfigMap openshift-machine-config-operator/kube-rbac-proxy

View the Description View the linked PRs

Description of problem:

In a 4.14 cluster, I'm seeing CVO hotloops on ClusterRoleBinding cluster-baremetal-operator and ConfigMap openshift-machine-config-operator/kube-rbac-proxy with empty ManagedFields.  

# oc logs  cluster-version-operator-7cf78c4f65-hfh7f -n openshift-cluster-version | grep -o 'Updating .*due to diff'| sort | uniq -c
     93 Updating ClusterRoleBinding cluster-baremetal-operator due to diff
     93 Updating ClusterRole machine-api-operator-ext-remediation due to diff
     93 Updating ConfigMap openshift-machine-config-operator/kube-rbac-proxy due to diff

CVO logs the diff as below:

I0919 10:19:24.658975       1 rbac.go:38] Updating ClusterRoleBinding cluster-baremetal-operator due to diff:   &v1.ClusterRoleBinding{
      TypeMeta: v1.TypeMeta{
-         Kind:       "",
+         Kind:       "ClusterRoleBinding",
-         APIVersion: "",
+         APIVersion: "rbac.authorization.k8s.io/v1",
      },
      ObjectMeta: v1.ObjectMeta{
          ... // 2 identical fields
          Namespace:                  "openshift-machine-api",
          SelfLink:                   "",
-         UID:                        "cb8a7ffe-9966-4224-b1b6-3e7db6da7009",
+         UID:                        "",
-         ResourceVersion:            "2571",
+         ResourceVersion:            "",
          Generation:                 0,
-         CreationTimestamp:          v1.Time{Time: s"2023-09-19 03:02:31 +0000 UTC"},
+         CreationTimestamp:          v1.Time{},
          DeletionTimestamp:          nil,
          DeletionGracePeriodSeconds: nil,
          ... // 2 identical fields
          OwnerReferences: {{APIVersion: "config.openshift.io/v1", Kind: "ClusterVersion", Name: "version", UID: "fb1c6e8c-01bc-415f-8b55-c55a4601bd10", ...}},
          Finalizers:      nil,
-         ManagedFields: []v1.ManagedFieldsEntry{
-             {
-                 Manager:    "cluster-version-operator",
-                 Operation:  "Update",
-                 APIVersion: "rbac.authorization.k8s.io/v1",
-                 Time:       s"2023-09-19 03:02:31 +0000 UTC",
-                 FieldsType: "FieldsV1",
-                 FieldsV1:   s`{"f:metadata":{"f:annotations":{".":{},"f:capability.openshift.i`...,
-             },
-         },
+         ManagedFields: nil,
      },
      Subjects: {{Kind: "ServiceAccount", Name: "cluster-baremetal-operator", Namespace: "openshift-machine-api"}},
      RoleRef:  {APIGroup: "rbac.authorization.k8s.io", Kind: "ClusterRole", Name: "cluster-baremetal-operator"},
  }

...

I0919 10:14:55.572553       1 core.go:138] Updating ConfigMap openshift-machine-config-operator/kube-rbac-proxy due to diff:   &v1.ConfigMap{
      TypeMeta: v1.TypeMeta{
-         Kind:       "",
+         Kind:       "ConfigMap",
-         APIVersion: "",
+         APIVersion: "v1",
      },
      ObjectMeta: v1.ObjectMeta{
          ... // 2 identical fields
          Namespace:                  "openshift-machine-config-operator",
          SelfLink:                   "",
-         UID:                        "9c6c667f-8e10-4fca-8c1d-c8c0fc158ee5",
+         UID:                        "",
-         ResourceVersion:            "164024",
+         ResourceVersion:            "",
          Generation:                 0,
-         CreationTimestamp:          v1.Time{Time: s"2023-09-19 03:01:42 +0000 UTC"},
+         CreationTimestamp:          v1.Time{},
          DeletionTimestamp:          nil,
          DeletionGracePeriodSeconds: nil,
          ... // 2 identical fields
          OwnerReferences: {{APIVersion: "config.openshift.io/v1", Kind: "ClusterVersion", Name: "version", UID: "fb1c6e8c-01bc-415f-8b55-c55a4601bd10", ...}},
          Finalizers:      nil,
-         ManagedFields: []v1.ManagedFieldsEntry{
-             {
-                 Manager:    "cluster-version-operator",
-                 Operation:  "Update",
-                 APIVersion: "v1",
-                 Time:       s"2023-09-19 10:10:23 +0000 UTC",
-                 FieldsType: "FieldsV1",
-                 FieldsV1:   s`{"f:data":{},"f:metadata":{"f:annotations":{".":{},"f:include.re`...,
-             },
-             {
-                 Manager:    "machine-config-operator",
-                 Operation:  "Update",
-                 APIVersion: "v1",
-                 Time:       s"2023-09-19 10:10:25 +0000 UTC",
-                 FieldsType: "FieldsV1",
-                 FieldsV1:   s`{"f:data":{"f:config-file.yaml":{}}}`,
-             },
-         },
+         ManagedFields: nil,
      },
      Immutable:  nil,
      Data:       {"config-file.yaml": "authorization:\n  resourceAttributes:\n    apiVersion: v1\n    reso"...},
      BinaryData: nil,
  }

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-09-15-233408

How reproducible:

1/1

Steps to Reproduce:

1. Install a 4.14 cluster
2.
3.

Actual results:

CVO hotloops on ClusterRoleBinding cluster-baremetal-operator and ConfigMap openshift-machine-config-operator/kube-rbac-proxy

Expected results:

CVO doesn't hotloop on resources with empty ManagedFields

Additional info:

https://github.com/openshift/cluster-version-operator/pull/993

Bug OCPBUGS-22594: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-storage-operator/pull/414

Bug OCPBUGS-24244: Update 4.15 ose-csi-external-snapshotter-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-snapshotter/pull/117

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug TRT-1359: monitor test azure-metrics-collector failed with throttle error

View the Description View the linked PRs

[Jira:"Test Framework"] monitor test azure-metrics-collector collection failure in https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/28395/pull-ci-openshift-origin-master-e2e-agnostic-ovn-cmd/1724427658311241728

Looks like Azure is throttling our request. We should probably try some retry mechanism.

Relevant thread: https://redhat-internal.slack.com/archives/C01CQA76KMX/p1699977299650309

https://github.com/openshift/origin/pull/28420

Bug MGMT-16235: Agent controller does not watch secrets

View the Description View the linked PRs

Description of the problem:

The Agent CR can reference a Secret containing a token for pulling ignition. This is generally used by HyperShift. The agent controller takes the token from the referenced Secret and applies it to the host in the DB. However, if the token is rotated, the agent controller doesn't notice this, and the agent continues to pull ignition with the old token, which obviously fails. The agent controller must watch these Secrets so that it will reconcile when the Secret is updated.

How reproducible:

100%

Steps to reproduce:

1. Create a hosted cluster and another host to be added

2. Wait for the token to be rotated in the Secret

3. Notice that the agent is still pulling with the old token

Actual results:

The agent is still pulling with the old token

Expected results:

The agent is pulls with the old token

https://github.com/openshift/assisted-service/pull/5736

Bug OCPBUGS-25191: [azure] using marketplace image fails while retrieving the image

View the Description View the linked PRs

Description of problem:

    https://github.com/openshift/installer/pull/7778 introduced a bug where an error is always returned while retrieving a marketplace image.

Version-Release number of selected component (if applicable):

How reproducible:

    always

Steps to Reproduce:

    1. Configure marketplace image in the install-config
    2. openshift-install create manifests
    3.

Actual results:

    $ ./openshift-install create manifests --dir ipi1 --log-level debug
DEBUG OpenShift Installer 4.16.0-0.test-2023-12-12-020559-ci-ln-xkqmlqk-latest 
DEBUG Built from commit 456ae720a83e39dffd9918c5a71388ad873b6a38 
DEBUG Fetching Master Machines...                  
DEBUG Loading Master Machines...                   
DEBUG   Loading Cluster ID...                      
DEBUG     Loading Install Config...                
DEBUG       Loading SSH Key...                     
DEBUG       Loading Base Domain...                 
DEBUG         Loading Platform...                  
DEBUG       Loading Cluster Name...                
DEBUG         Loading Base Domain...               
DEBUG         Loading Platform...                  
DEBUG       Loading Pull Secret...                 
DEBUG       Loading Platform...                    
INFO Credentials loaded from file "/home/fedora/.azure/osServicePrincipal.json" 
ERROR failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: [controlPlane.platform.azure.osImage: Invalid value: azure.OSImage{Plan:"", Publisher:"redhat", Offer:"rh-ocp-worker", SKU:"rh-ocp-worker", Version:"413.92.2023101700"}: could not get marketplace image: %!w(<nil>), compute[0].platform.azure.osImage: Invalid value: azure.OSImage{Plan:"", Publisher:"redhat", Offer:"rh-ocp-worker", SKU:"rh-ocp-worker", Version:"413.92.2023101700"}: could not get marketplace image: %!w(<nil>)]

Expected results:

    Success

Additional info:

    When {{errors.Wrap(err, ...)}} was replaced by {{fmt.Errorf(...)}}, there is a slight difference in behavior in which {{errors.Wrap}} returns {{nil}} if {{err}} is {{nil}} but {{fmt.Errorf}} always returns an error.

https://github.com/openshift/installer/pull/7826

Bug OCPBUGS-22284: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13266

Bug OCPBUGS-23170: vsphere techpreview installs are failing

View the Description View the linked PRs

https://github.com/openshift/installer/pull/7418 broke techpreview installs https://amd64.ocp.releases.ci.openshift.org/releasestream/4.15.0-0.nightly/release/4.15.0-0.nightly-2023-11-09-094429

https://github.com/openshift/installer/pull/7708

Bug OCPBUGS-23652: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-nutanix/pull/27

Bug OCPBUGS-24031: Bump FCOS to latest stable

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7779

Bug OCPBUGS-25399: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-olm-operator/pull/40

Bug OCPBUGS-25818: CNV upgrades from v4.14.1 to v4.15.0 (unreleased) are not starting due to out of sync operatorCondition

View the Description View the linked PRs

This is a clone of issue OCPBUGS-25673. The following is the description of the original issue:
—
Description of problem:

CNV upgrades from v4.14.1 to v4.15.0 (unreleased) are not starting due to out of sync operatorCondition.

We see:

$ oc get csv
NAME                                       DISPLAY                    VERSION               REPLACES                                   PHASE
kubevirt-hyperconverged-operator.v4.14.1   OpenShift Virtualization   4.14.1                kubevirt-hyperconverged-operator.v4.14.0   Replacing
kubevirt-hyperconverged-operator.v4.15.0   OpenShift Virtualization   4.15.0                kubevirt-hyperconverged-operator.v4.14.1   Pending

And on the v4.15.0 CSV:

$ oc get csv kubevirt-hyperconverged-operator.v4.15.0 -o yaml
....
status:
  cleanup: {}
  conditions:
  - lastTransitionTime: "2023-12-19T01:50:48Z"
    lastUpdateTime: "2023-12-19T01:50:48Z"
    message: requirements not yet checked
    phase: Pending
    reason: RequirementsUnknown
  - lastTransitionTime: "2023-12-19T01:50:48Z"
    lastUpdateTime: "2023-12-19T01:50:48Z"
    message: 'operator is not upgradeable: the operatorcondition status "Upgradeable"="True"
      is outdated'
    phase: Pending
    reason: OperatorConditionNotUpgradeable
  lastTransitionTime: "2023-12-19T01:50:48Z"
  lastUpdateTime: "2023-12-19T01:50:48Z"
  message: 'operator is not upgradeable: the operatorcondition status "Upgradeable"="True"
    is outdated'
  phase: Pending
  reason: OperatorConditionNotUpgradeable

and if we check the pending operator condition (v4.14.1) we see:

$ oc get operatorcondition kubevirt-hyperconverged-operator.v4.14.1 -o yaml
apiVersion: operators.coreos.com/v2
kind: OperatorCondition
metadata:
  creationTimestamp: "2023-12-16T17:10:17Z"
  generation: 18
  labels:
    operators.coreos.com/kubevirt-hyperconverged.openshift-cnv: ""
  name: kubevirt-hyperconverged-operator.v4.14.1
  namespace: openshift-cnv
  ownerReferences:
  - apiVersion: operators.coreos.com/v1alpha1
    blockOwnerDeletion: false
    controller: true
    kind: ClusterServiceVersion
    name: kubevirt-hyperconverged-operator.v4.14.1
    uid: 7db79d4b-e69e-4af8-9335-6269cf004440
  resourceVersion: "4116127"
  uid: 347306c9-865a-42b8-b2c9-69192b0e350a
spec:
  conditions:
  - lastTransitionTime: "2023-12-18T18:47:23Z"
    message: ""
    reason: Upgradeable
    status: "True"
    type: Upgradeable
  deployments:
  - hco-operator
  - hco-webhook
  - hyperconverged-cluster-cli-download
  - cluster-network-addons-operator
  - virt-operator
  - ssp-operator
  - cdi-operator
  - hostpath-provisioner-operator
  - mtq-operator
  serviceAccounts:
  - hyperconverged-cluster-operator
  - cluster-network-addons-operator
  - kubevirt-operator
  - ssp-operator
  - cdi-operator
  - hostpath-provisioner-operator
  - mtq-operator
  - cluster-network-addons-operator
  - kubevirt-operator
  - ssp-operator
  - cdi-operator
  - hostpath-provisioner-operator
  - mtq-operator
status:
  conditions:
  - lastTransitionTime: "2023-12-18T09:41:06Z"
    message: ""
    observedGeneration: 11
    reason: Upgradeable
    status: "True"
    type: Upgradeable

where metadata.generation (18) is not in sync with status.conditions[*].observedGeneration (11).

Even manually redacting spec.conditions.lastTransitionTime is causing a change in metadata.generation (as expected) but this doesn't trigger any reconciliation on the OLM and so status.conditions[*].observedGeneration remains at 11.

$ oc get operatorcondition kubevirt-hyperconverged-operator.v4.14.1 -o yaml
apiVersion: operators.coreos.com/v2
kind: OperatorCondition
metadata:
  creationTimestamp: "2023-12-16T17:10:17Z"
  generation: 19
  labels:
    operators.coreos.com/kubevirt-hyperconverged.openshift-cnv: ""
  name: kubevirt-hyperconverged-operator.v4.14.1
  namespace: openshift-cnv
  ownerReferences:
  - apiVersion: operators.coreos.com/v1alpha1
    blockOwnerDeletion: false
    controller: true
    kind: ClusterServiceVersion
    name: kubevirt-hyperconverged-operator.v4.14.1
    uid: 7db79d4b-e69e-4af8-9335-6269cf004440
  resourceVersion: "4147472"
  uid: 347306c9-865a-42b8-b2c9-69192b0e350a
spec:
  conditions:
  - lastTransitionTime: "2023-12-18T18:47:25Z"
    message: ""
    reason: Upgradeable
    status: "True"
    type: Upgradeable
  deployments:
  - hco-operator
  - hco-webhook
  - hyperconverged-cluster-cli-download
  - cluster-network-addons-operator
  - virt-operator
  - ssp-operator
  - cdi-operator
  - hostpath-provisioner-operator
  - mtq-operator
  serviceAccounts:
  - hyperconverged-cluster-operator
  - cluster-network-addons-operator
  - kubevirt-operator
  - ssp-operator
  - cdi-operator
  - hostpath-provisioner-operator
  - mtq-operator
  - cluster-network-addons-operator
  - kubevirt-operator
  - ssp-operator
  - cdi-operator
  - hostpath-provisioner-operator
  - mtq-operator
status:
  conditions:
  - lastTransitionTime: "2023-12-18T09:41:06Z"
    message: ""
    observedGeneration: 11
    reason: Upgradeable
    status: "True"
    type: Upgradeable

since its observedGeneration is out of sync, this check:
https://github.com/operator-framework/operator-lifecycle-manager/blob/master/pkg/controller/operators/olm/operatorconditions.go#L44C1-L48

fails and the upgrade never starts.

I suspect (I'm only guessing) that it could be a regression introduced with the memory optimization for https://issues.redhat.com/browse/OCPBUGS-17157 .

Version-Release number of selected component (if applicable):

    OCP 4.15.0-ec.3

How reproducible:

- Not reproducible (with the same CNV bundles) on OCP v4.14.z.
- Pretty high (but not 100%) on OCP 4.15.0-ec.3

Steps to Reproduce:

    1. Try triggering a CNV v4.14.1 -> v4.15.0 on OCP 4.15.0-ec.3
    2.
    3.

Actual results:

    The OLM is not reacting to changes on spec.conditions on the pending operator condition, so metadata.generation is constantly out of sync with status.conditions[*].observedGeneration and so the CSV is reported as 

    message: 'operator is not upgradeable: the operatorcondition status "Upgradeable"="True"
      is outdated'
    phase: Pending
    reason: OperatorConditionNotUpgradeable

Expected results:

    The OLM correctly reconcile the operatorCondition and the upgrade starts

Additional info:

    Not reproducible with exactly the same bundle (origin and target) on OCP v4.14.z

https://github.com/openshift/operator-framework-olm/pull/644

Bug OCPBUGS-3541: When an ingresscontroller with empty/invalid spec is created and then deleted, "route_metrics_controiller_routes_per_shard" metric displays incorrect value

View the Description View the linked PRs

Description of problem:

When creating an ingresscontroller with empty spec (or where spec.domain clashes with an existing IC), the ingresscontroller's status shows  Admitted as "False" and reason is "Invalid". However, "route_controller_metrics_routes_per_shard" metric shows the shard in the Observe tab of the web-console.

When the invalid ingresscontroller is deleted, the "route_controller_metrics_routes_per_shard" metric
does not clear the row corresponding to the deleted invalid IC.

Version-Release number of selected component (if applicable):

4.12.0-ec5

How reproducible:

Always

Steps to Reproduce:

1. Create the invalid IC with the following spec:

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: ic-invalid
  namespace: openshift-ingress-operator
spec: {}

2. Check the status of the IC:

$ oc get ingresscontroller -n openshift-ingress-operator ic-invalid -oyaml
apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"operator.openshift.io/v1","kind":"IngressController","metadata":{"annotations":{},"name":"ic-invalid","namespace":"openshift-ingress-operator"},"spec":{}}
  creationTimestamp: "2022-11-11T12:53:41Z"
  generation: 1
  name: ic-invalid
  namespace: openshift-ingress-operator
  resourceVersion: "97453"
  uid: 96eae28e-bb14-447e-822f-602f3a3bb378
spec:
  httpEmptyRequestsPolicy: Respond
status:
  availableReplicas: 0
  conditions:
  - lastTransitionTime: "2022-11-11T12:53:41Z"
    message: 'conflicts with: default'
    reason: Invalid
    status: "False"
    type: Admitted
  domain: apps.arsen-cluster1.devcluster.openshift.com
  endpointPublishingStrategy:
    loadBalancer:
      dnsManagementPolicy: Managed
      providerParameters:
        aws:
          classicLoadBalancer:
            connectionIdleTimeout: 0s
          type: Classic
        type: AWS
      scope: External
    type: LoadBalancerService
  observedGeneration: 1
  selector: ""

3. Check the "route_metrics_controller_routes_per_shard" metric on the web-console

4. Delete the IC

5. Check the "route_metrics_controller_routes_per_shard" metric again on the web-console

Actual results:

As shown in the attached screenshot, "route_metrics_controller_routes_per_shard" metric adds one row for the
invalid IC. This is not cleared even when the IC is deleted.

Expected results:

The "route_metrics_controller_routes_per_shard" metric should not add metric for invalid ICs.
Additionally, when the invalid IC is deleted the metric should clear the corresponding row.

Additional info:

https://github.com/openshift/cluster-ingress-operator/pull/869

Bug OCPBUGS-19271: Update 4.15 hypershift image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/hypershift/pull/3017

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-23178: cloud-credential-operator cannot add new grants to deleted gcp role

View the Description View the linked PRs

Description of problem:

   The GCP Mint mode sync is failing when attempting to add permissions to a previously deleted custom role.

Version-Release number of selected component (if applicable):

    4.15

How reproducible:

    Always

Steps to Reproduce:

    1. Create a gcp cluster in mint mode (with a CCO credentialRequests that has permissions defined)
    2. Delete the openshift-hive-dev-cloud-credential-operator-gcp-ro-creds custom role from GCP
    3. oc -n openshift-cloud-credential-operator delete secret cloud-credential-operator-gcp-ro-creds

Actual results:

    Receive the following error when attempting to add permissions to the deleted custom role: "cloud-credential-operator cannot add new grants to deleted gcp role"

Expected results:

    The new permissions should be added to the role without issue.

Additional info:

https://github.com/openshift/cloud-credential-operator/pull/637

Bug OCPBUGS-23706: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-autoscaler-operator/pull/307

Bug OCPBUGS-14787: Stop installing containernetworking-cni plugins in ovnk images

View the Description View the linked PRs

Nothing uses these plugins in the ovnk image, and having them complicates security checking that needs to use a different path to check RPMs instead of stuff build directly in the dockerfile.

Since they're unused, just remove them.

https://github.com/openshift/ovn-kubernetes/pull/1702

Bug OCPBUGS-26043: Adding test case when exceed openshift.io/image-tags will ban to create new image references in the project

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25943~~. The following is the description of the original issue:
—
Description of problem:

Adding test case when exceed openshift.io/image-tags will ban to create new image references in the project

Version-Release number of selected component (if applicable):

    4.16

pr - https://github.com/openshift/origin/pull/28464

https://github.com/openshift/origin/pull/28492

Bug OCPBUGS-19155: Update 4.15 ose-csi-driver-shared-resource-webhook image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-driver-shared-resource/pull/142

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-19163: Update 4.15 ose-cloud-network-config-controller image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-network-config-controller/pull/122

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-network-config-controller/pull/122

Bug OCPBUGS-19834: In HCP cluster updating pull-secret in hosted cluster CR on HUB cluster is not reflecting on HCP cluster VMs

View the Description View the linked PRs

Description of problem:

Customer created hosted control plane (HCP of type kubevirt) clusters on Hub OCP cluster

Now for their workload to pull images on HCP cluster



They added  auth for our registries to a secret named "scale-rm-pull-secret" in "clusters" namespace in Hub cluster.And then specified this secret "scale-rm-pull-secret" in hostedcluster CR for HCP in question in hub under namespace "clusters"



They expect this  change to reflect on HCP cluster nodes and images to be pulled successfully. However they keep getting imagepullbackoff error on HCP cluster

PodPibm-spectrum-scale-controller-manager-5cb84655b4-dvnxk

NamespaceNSibm-spectrum-scale-operator

Generated from kubelet on scale-41312-t7nml
2 times in the last 0 minutes
Failed to pull image "icr.io/cpopen/ibm-spectrum-scale-operator@sha256:f6138abb5493d7ef6405dcf0a6bb5afc697cca9f20be1a88b3214268b6382da8": rpc error: code = Unknown desc = (Mirrors also failed: [cp.stg.icr.io/cp/ibm-spectrum-scale-operator@sha256:f6138abb5493d7ef6405dcf0a6bb5afc697cca9f20be1a88b3214268b6382da8: Requesting bearer token: invalid status code from registry 400 (Bad Request)] [docker-na-public.artifactory.swg-devops.com/sys-spectrum-scale-team-cloud-native-docker-local/ibm-spectrum-scale-operator@sha256:f6138abb5493d7ef6405dcf0a6bb5afc697cca9f20be1a88b3214268b6382da8: unable to retrieve auth token: invalid username/password: authentication required]): icr.io/cpopen/ibm-spectrum-scale-operator@sha256:f6138abb5493d7ef6405dcf0a6bb5afc697cca9f20be1a88b3214268b6382da8: reading manifest sha256:f6138abb5493d7ef6405dcf0a6bb5afc697cca9f20be1a88b3214268b6382da8 in icr.io/cpopen/ibm-spectrum-scale-operator: manifest unknown

Customer is able to pull the image manually using same credentials

podman pull docker-na-public.artifactory.swg-devops.com/sys-spectrum-scale-team-cloud-native-docker-local/ibm-spectrum-scale-operator@sha256:f6138abb5493d7ef6405dcf0a6bb5afc697cca9f20be1a88b3214268b6382da8

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Pulled the image manually on nodes successfully after logging to registry with same credentials but pod continues to say can not pull image. ANother thing to note is that pod has imagepullpolicy as "ifnotpresent" so after manual pull on all three nodes also why it continue to throw same error
podman pull docker-na-public.artifactory.swg-devops.com/sys-spectrum-scale-team-cloud-native-docker-local/ibm-spectrum-scale-operator@sha256:f6138abb5493d7ef6405dcf0a6bb5afc697cca9f20be1a88b3214268b6382da8
Trying to pull docker-na-public.artifactory.swg-devops.com/sys-spectrum-scale-team-cloud-native-docker-local/ibm-spectrum-scale-operator@sha256:f6138abb5493d7ef6405dcf0a6bb5afc697cca9f20be1a88b3214268b6382da8...
Getting image source signatures
Copying blob 1e3d9b7d1452 skipped: already exists  
Copying blob fe5ca62666f0 skipped: already exists  
Copying blob e8c73c638ae9 skipped: already exists  
Copying blob fcb6f6d2c998 skipped: already exists  
Copying blob b02a7525f878 skipped: already exists  
Copying blob 4aa0ea1413d3 skipped: already exists  
Copying blob 7c881f9ab25e skipped: already exists  
Copying blob 5627a970d25e skipped: already exists  
Copying blob c7e34367abae skipped: already exists  
Copying blob f92848770344 skipped: already exists  
Copying blob a7ca0d9ba68f skipped: already exists  
Copying config 07120ff2fe done  
Writing manifest to image destination
Storing signatures
07120ff2fe00d6335ef757b33546fc9ec9e3d799a500349343f09228bcdf73c0
sh-5.1# 

PodPibm-spectrum-scale-controller-manager-5cb84655b4-dvnxk
NamespaceNSibm-spectrum-scale-operator
21 Sept 2023, 17:58
Generated from kubelet on scale-41312-t7nml
2 times in the last 0 minutes
Failed to pull image "icr.io/cpopen/ibm-spectrum-scale-operator@sha256:f6138abb5493d7ef6405dcf0a6bb5afc697cca9f20be1a88b3214268b6382da8": rpc error: code = Unknown desc = (Mirrors also failed: [cp.stg.icr.io/cp/ibm-spectrum-scale-operator@sha256:f6138abb5493d7ef6405dcf0a6bb5afc697cca9f20be1a88b3214268b6382da8: Requesting bearer token: invalid status code from registry 400 (Bad Request)] [docker-na-public.artifactory.swg-devops.com/sys-spectrum-scale-team-cloud-native-docker-local/ibm-spectrum-scale-operator@sha256:f6138abb5493d7ef6405dcf0a6bb5afc697cca9f20be1a88b3214268b6382da8: unable to retrieve auth token: invalid username/password: authentication required]): icr.io/cpopen/ibm-spectrum-scale-operator@sha256:f6138abb5493d7ef6405dcf0a6bb5afc697cca9f20be1a88b3214268b6382da8: reading manifest sha256:f6138abb5493d7ef6405dcf0a6bb5afc697cca9f20be1a88b3214268b6382da8 in icr.io/cpopen/ibm-spectrum-scale-operator: manifest unknown

https://github.com/openshift/hypershift/pull/3237

Bug OCPBUGS-21672: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-24583: push violation regression check into the default requirement result

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/28433

Bug OCPBUGS-25486: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/1999

Task MON-3398: Request for sending new RHACM metric via Telemetry

View the Description View the linked PRs

Request for sending data via telemetry

The goal is to collect metrics about user page interaction to better understand how customers use the console, and in turn develop a better experience.

acm_console_page_count:sum

acm_console_page_count:sum represents a counter for page visits across the main product pages.

Labels

page, possible values are: overview-classic, overview-fleet, search, search-details, clusters, application, governance

The cardinality of the metric is at most 7 (7 page labels listed above - PrometheusRule is implemented to sum the page visit counts across Pods).

https://github.com/openshift/cluster-monitoring-operator/pull/2100

Bug OCPBUGS-19175: Update 4.15 ose-cluster-kube-scheduler-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-kube-scheduler-operator/pull/493

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-kube-scheduler-operator/pull/493

Bug OCPBUGS-21984: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-alibaba-cloud/pull/41

Bug OCPBUGS-23463: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-kube-scheduler-operator/pull/512

Bug OCPBUGS-25947: Converting load balancer service from internal scope to external keeps internal load balancer IP on GCP

View the Description View the linked PRs

Reproducer:
1. On a GCP cluster, create an ingress controller with internal load balancer scope, like this:

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: foo
  namespace: openshift-ingress-operator
spec:
  domain: foo.<cluster-domain>
  endpointPublishingStrategy:
    type: LoadBalancerService
    loadBalancer:
      dnsManagementPolicy: Managed
      scope: Internal

2. Wait for load balancer service to complete rollout

$ oc -n openshift-ingress get service router-foo
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
router-foo LoadBalancer 172.30.101.233 10.0.128.5 80:32019/TCP,443:32729/TCP 81s

3. Edit ingress controller to set spec.endpointPublishingStrategy.loadBalancer.scope to External

the load balancer service (router-foo in this case) should get an external IP address, but currently it keeps the 10.x.x.x address that was already assigned.

https://github.com/openshift/cloud-provider-gcp/pull/56

Bug OCPBUGS-19437: API docs content issue

View the Description View the linked PRs

Description of problem:

As the original PR has been merged, open the new bug for tracking the issue in Doc

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Sheet format issue for 'useActiveColumns','K8sGetResource',
' k8sDeleteResource,'k8sListResource', 'K8sUpdateResource' and 'k8sPatchResource'
Attached: https://drive.google.com/file/d/1NgitSi9mgB3zluqmp8eza4DhFOVY-Pt9/view?usp=drive_link 
2. The text 'code' is not highlight in 'getGroupVersionKindForModel'
Attached: https://drive.google.com/file/d/1sVxXdlIBxKxxokZX2iorJOER7ILGByzm/view?usp=drive_link
3. Incorrect </br> setting in 'ErrorBoundaryFallbackPage'
https://drive.google.com/file/d/1ubhcFb68kDwL-wKsknP1Hb0fos480OnA/view?usp=drive_link
4. Several links marked with label {@link}： ListPageCreate， useK8sModel，k8sGetResource，k8sDeleteResource， k8sListResource， k8sListResourceItems，YAMLEditor

Actual results:

Expected results:

Additional info:

Impacted Code Line:
https://github.com/Mylanos/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#L616-L619
https://github.com/Mylanos/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#L1277-L1283
https://github.com/Mylanos/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#L1404-L1410
https://github.com/Mylanos/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#L1434-L1437
https://github.com/Mylanos/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#L1335-L1341
https://github.com/Mylanos/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#L1365-L1370
https://github.com/Mylanos/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#L1528
https://github.com/Mylanos/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#L2157
https://github.com/Mylanos/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#L698
https://github.com/Mylanos/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#L1035
https://github.com/Mylanos/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#L1452
https://github.com/Mylanos/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#L2480

Bug OCPBUGS-26060: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-aws/pull/64

Bug OCPBUGS-22839: Failed to create the sandbox-plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): failed to send CNI request: Post "http://dummy/cni": EOF [Release-4.15]

View the Description View the linked PRs

Description of problem:

In the 4.14 z-stream rollback job, I'm seeing test-case "[sig-network] pods should successfully create sandboxes by adding pod to network " fail. 

The job link is here https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-ovn-upgrade-rollback-oldest-supported/1719037590788640768

The error is:

56 failures to create the sandbox

ns/openshift-monitoring pod/prometheus-k8s-1 node/ip-10-0-48-75.us-east-2.compute.internal - 3314.57 seconds after deletion - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-k8s-1_openshift-monitoring_95d1a457-3e1b-4ae3-8b57-8023eec5937d_0(5b36bc12b2964e85bcdbe60b275d6a12ea68cb18b81f16622a6cb686270c4eb3): error adding pod openshift-monitoring_prometheus-k8s-1 to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): failed to send CNI request: Post "http://dummy/cni": EOF
ns/openshift-monitoring pod/prometheus-k8s-1 node/ip-10-0-48-75.us-east-2.compute.internal - 3321.57 seconds after deletion - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-k8s-1_openshift-monitoring_95d1a457-3e1b-4ae3-8b57-8023eec5937d_0(3cc0afc5bec362566e4c3bdaf822209377102c2e39aaa8ef5d99b0f4ba795aaf): error adding pod openshift-monitoring_prometheus-k8s-1 to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): failed to send CNI request: Post "http://dummy/cni": dial unix /run/multus/socket/multus.sock: connect: connection refused

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-30-170011

How reproducible:

Flaky

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

The rollback test is testing by installing 4.14.0, then upgrade to the latest 4.14.nightly, at some random point, rolling back to 4.14.0

Bug OCPBUGS-24109: Update 4.15 ose-ibm-cloud-controller-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-ibm/pull/60

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-ibm/pull/60

Bug OCPBUGS-24140: Update 4.15 ose-cluster-control-plane-machine-set-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/266

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/266

Bug OCPBUGS-18392: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/1989

Bug OCPBUGS-18954: Many SNOs failed to complete install because "the cluster operator cluster-autoscaler is not available"

View the Description View the linked PRs

Description of problem:

While installing 3618 SNOs via ZTP using ACM 2.9, 15 clusters failed to complete install and have failed on the cluster-autoscaler operator. This represents the bulk of all cluster install failures in this testbed for OCP 4.14.0-rc.0.


# cat aci.InstallationFailed.autoscaler  | xargs -I % sh -c "echo -n '% '; oc --kubeconfig /root/hv-vm/kc/%/kubeconfig get clusterversion --no-headers "
vm00527 version         False   True   20h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm00717 version         False   True   14h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm00881 version         False   True   19h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm00998 version         False   True   18h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm01006 version         False   True   17h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm01059 version         False   True   15h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm01155 version         False   True   14h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm01930 version         False   True   17h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm02407 version         False   True   16h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm02651 version         False   True   18h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm03073 version         False   True   19h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm03258 version         False   True   20h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm03295 version         False   True   14h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm03303 version         False   True   15h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm03517 version         False   True   18h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available

Version-Release number of selected component (if applicable):

Hub 4.13.11
Deployed SNOs 4.14.0-rc.0
ACM 2.9 - 2.9.0-DOWNSTREAM-2023-09-07-04-47-52

How reproducible:

15 out of 20 failures (75% of the failures)
15 out of 3618 total attempted SNOs to be installed ~.4% of all installs

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

It appears that some show in the logs of the cluster-autoscaler-operator an error, Example:

I0912 19:54:39.962897       1 main.go:15] Go Version: go1.20.5 X:strictfipsruntime
I0912 19:54:39.962977       1 main.go:16] Go OS/Arch: linux/amd64
I0912 19:54:39.962982       1 main.go:17] Version: cluster-autoscaler-operator v4.14.0-202308301903.p0.gb57f5a9.assembly.stream-dirty
I0912 19:54:39.963137       1 leaderelection.go:122] The leader election gives 4 retries and allows for 30s of clock skew. The kube-apiserver downtime tolerance is 78s. Worst non-graceful lease acquisition is 2m43s. Worst graceful lease acquisition is {26s}.
I0912 19:54:39.975478       1 listener.go:44] controller-runtime/metrics "msg"="Metrics server is starting to listen" "addr"="127.0.0.1:9191"
I0912 19:54:39.976939       1 server.go:187] controller-runtime/webhook "msg"="Registering webhook" "path"="/validate-clusterautoscalers"
I0912 19:54:39.976984       1 server.go:187] controller-runtime/webhook "msg"="Registering webhook" "path"="/validate-machineautoscalers"
I0912 19:54:39.977082       1 main.go:41] Starting cluster-autoscaler-operator
I0912 19:54:39.977216       1 server.go:216] controller-runtime/webhook/webhooks "msg"="Starting webhook server" 
I0912 19:54:39.977693       1 certwatcher.go:161] controller-runtime/certwatcher "msg"="Updated current TLS certificate" 
I0912 19:54:39.977813       1 server.go:273] controller-runtime/webhook "msg"="Serving webhook server" "host"="" "port"=8443
I0912 19:54:39.977938       1 certwatcher.go:115] controller-runtime/certwatcher "msg"="Starting certificate watcher" 
I0912 19:54:39.978008       1 server.go:50]  "msg"="starting server" "addr"={"IP":"127.0.0.1","Port":9191,"Zone":""} "kind"="metrics" "path"="/metrics"
I0912 19:54:39.978052       1 leaderelection.go:245] attempting to acquire leader lease openshift-machine-api/cluster-autoscaler-operator-leader...
I0912 19:54:39.982052       1 leaderelection.go:255] successfully acquired lease openshift-machine-api/cluster-autoscaler-operator-leader
I0912 19:54:39.983412       1 controller.go:177]  "msg"="Starting EventSource" "controller"="cluster_autoscaler_controller" "source"="kind source: *v1.ClusterAutoscaler"
I0912 19:54:39.983462       1 controller.go:177]  "msg"="Starting EventSource" "controller"="cluster_autoscaler_controller" "source"="kind source: *v1.Deployment"
I0912 19:54:39.983483       1 controller.go:177]  "msg"="Starting EventSource" "controller"="cluster_autoscaler_controller" "source"="kind source: *v1.Service"
I0912 19:54:39.983501       1 controller.go:177]  "msg"="Starting EventSource" "controller"="cluster_autoscaler_controller" "source"="kind source: *v1.ServiceMonitor"
I0912 19:54:39.983520       1 controller.go:177]  "msg"="Starting EventSource" "controller"="cluster_autoscaler_controller" "source"="kind source: *v1.PrometheusRule"
I0912 19:54:39.983532       1 controller.go:185]  "msg"="Starting Controller" "controller"="cluster_autoscaler_controller"
I0912 19:54:39.986041       1 controller.go:177]  "msg"="Starting EventSource" "controller"="machine_autoscaler_controller" "source"="kind source: *v1beta1.MachineAutoscaler"
I0912 19:54:39.986065       1 controller.go:177]  "msg"="Starting EventSource" "controller"="machine_autoscaler_controller" "source"="kind source: *unstructured.Unstructured"
I0912 19:54:39.986072       1 controller.go:185]  "msg"="Starting Controller" "controller"="machine_autoscaler_controller"
I0912 19:54:40.095808       1 webhookconfig.go:72] Webhook configuration status: created
I0912 19:54:40.101613       1 controller.go:219]  "msg"="Starting workers" "controller"="cluster_autoscaler_controller" "worker count"=1
I0912 19:54:40.102857       1 controller.go:219]  "msg"="Starting workers" "controller"="machine_autoscaler_controller" "worker count"=1
E0912 19:58:48.113290       1 leaderelection.go:327] error retrieving resource lock openshift-machine-api/cluster-autoscaler-operator-leader: Get "https://[fd02::1]:443/apis/coordination.k8s.io/v1/namespaces/openshift-machine-api/leases/cluster-autoscaler-operator-leader": net/http: TLS handshake timeout - error from a previous attempt: unexpected EOF
E0912 20:02:48.135610       1 leaderelection.go:327] error retrieving resource lock openshift-machine-api/cluster-autoscaler-operator-leader: Get "https://[fd02::1]:443/apis/coordination.k8s.io/v1/namespaces/openshift-machine-api/leases/cluster-autoscaler-operator-leader": dial tcp [fd02::1]:443: connect: connection refused
E0913 13:49:02.118757       1 leaderelection.go:327] error retrieving resource lock openshift-machine-api/cluster-autoscaler-operator-leader: Get "https://[fd02::1]:443/apis/coordination.k8s.io/v1/namespaces/openshift-machine-api/leases/cluster-autoscaler-operator-leader": dial tcp [fd02::1]:443: connect: connection refused

https://github.com/openshift/cluster-autoscaler-operator/pull/285

Bug MGMT-16052: Re-creating AgentServiceConfig after deploying VSphere platform spoke results in assisted-service crashing

View the Description View the linked PRs

Description of the problem:

After installing a VSphere platform spoke from the infrastructure operator, deleting and re-creating the agentserviceconfig results in the assisted-service pod continually crashing and being unable to recover

How reproducible:

100%

Steps to reproduce:

1. Install a spoke cluster with platformType: VSphere

2. Delete and re-create the agentserviceconfig

Actual results:

The assisted-service pod panics due to accessing a nil pointer

Expected results:

The assisted-service pod starts correctly and the vsphere cluster can continue to be managed

Workaround:
Delete all of the cluster resources related to the VSphere spoke cluster

https://github.com/openshift/assisted-service/pull/5659

Bug OCPBUGS-19108: Update 4.15 prometheus-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prometheus-operator/pull/242

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/prometheus-operator/pull/242

Bug OCPBUGS-20479: Ignore pod sandbox creation failures due to networking when the node is NetworkUnavailable=true

View the Description View the linked PRs

The test:

[sig-network] pods should successfully create sandboxes by adding pod to network

Failed a couple payloads today with 1-2 failures in batches of 10 aggregated jobs. I looked at the most recent errors and they seem to often be the same:

1 failures to create the sandbox

ns/openshift-monitoring pod/prometheus-k8s-1 node/ip-10-0-24-217.us-west-1.compute.internal - 475.52 seconds after deletion - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-k8s-1_openshift-monitoring_c712fc61-5a1e-4cec-b6fa-18c8f2e91c0a_0(46df8384ffeb433fc0e4864262aa52f2ede570265c43bf8b0900f184b27b10f1): error adding pod openshift-monitoring_prometheus-k8s-1 to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): failed to send CNI request: Post "http://dummy/cni": EOF

This http://dummy/cni URL looked interesting and seemed worthy of a bug.

The problem is a rare failure overall, but happening quite frequently day to day, search.ci indicates lots of hits over the last two days in both 4.14 and 4.15, and seemingly ovn and sdn both:

https://search.ci.openshift.org/?search=Post+%22http%3A%2F%2Fdummy%2Fcni%22%3A+EOF&maxAge=48h&context=1&type=bug%2Bissue%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Some of these will show as flakes as the test gets retried at times and then passes.

Additionally in 4.14 we are seeing similar failures reporting

No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?

4.14.0-0.nightly-2023-10-12-015817 show pod sandbox errors for azure & aws both show a drop from the 10th which comes after our force accept

4.14.0-0.nightly-2023-10-11-141212 had a host of failures but it is what killed aws sdn

4.14.0-0.nightly-2023-10-11-200059 aws sdn as well and shows in azure

https://github.com/openshift/origin/pull/28366

Bug OCPBUGS-23350: HostedControlPlane Nodeport service is not opened in a dualstack deployment

View the Description View the linked PRs

Description of problem:

After extensive debugging on HostedControlPlanes in dual stack mode, we have discovered that QE department has issues in dual stack environments. 

In Hypershift/HostedControlPlane, we have an HAProxy in the dataplane (worker nodes of the HostedCluster). This HAProxy is unable to redirect calls to the KubeApiServer in the ControlPlane, attempts to connect using both protocols, IPv6 initially and then IPv4. The issue is that the HostedCluster is exposing services in NodePort mode, and it seems that the masterNodes of the management cluster are not opening these NodePorts in IPv6, only in IPv4.
Even though the master node shows this trace with netstat:

tcp6 9 0 :::32272 :::* LISTEN 6086/ovnkube

It seems that it is only opening in IPv4, as it is not possible to connect to the API via IPv6 even locally. This only happens with dual stack; in the case of IPv4 and v6, it works correctly in single-stack mode.

Version-Release number of selected component (if applicable):

4.14.X
4.15.X

How reproducible:

100%

Steps to Reproduce:

1. Deploy an Openshift management cluster in dual stack mode
2. Deploy MCE 2.4
3. Deploy a HostedCluster in dual stack mode

Actual results:

- Many pods stuck in ContainerCreating state
- The HostedCluster cannot be deployed, many COs blocked and clusterversion also stuck

Expected results:

HostedCluster deployment done

Additional info:

To reproduce the issue you could contact @jparrill or @Liangquan Li in slack, this will make things easier for the environment creation.

https://github.com/openshift/hypershift/pull/3210

Bug OCPBUGS-18455: Unable to disable external CCM for platform external

View the Description View the linked PRs

Description of problem:

Some 3rd party clouds do not require the use of an external CCM. The installer enables an external CCM by default whenever the platform is external.

Version-Release number of selected component (if applicable):

4.14 nightly

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

The external CCM can not be disabled when the platform type is external.

Expected results:

The external CCM should be able to be disabled when the platform type is external.

Additional info:

https://github.com/openshift/installer/pull/7533

Bug OCPBUGS-23327: file path used for oci images can result in an error

View the Description View the linked PRs

Description of problem:

When executing oc mirror using an oci path, you can end up with in an error state when the destination is a file://&lt;path> destination (i.e. mirror to disk).

Version-Release number of selected component (if applicable):

4.14.2

How reproducible:

always

Steps to Reproduce:

At IBM we use the ibm-pak tool to generate a OCI catalog, but this bug is reproducible using a simple skopeo copy. Once you've copied the image locally you can move it around using file system copy commands to test this in different ways.

1. Make a directory structure like this to simulate how ibm-pak creates its own catalogs. The problem seems to be related to the path you use, so this represents the failure case:

mkdir -p /root/.ibm-pak/data/publish/latest/catalog-oci/manifest-list

2. make a location where the local storage will live:

mkdir -p /root/.ibm-pak/oc-mirror-storage

3. Next, copy the image locally using skopeo:

skopeo copy docker://icr.io/cpopen/ibm-zcon-zosconnect-catalog@sha256:8d28189637b53feb648baa6d7e3dd71935656a41fd8673292163dd750ef91eec oci:///root/.ibm-pak/data/publish/latest/catalog-oci/manifest-list --all --format v2s2

4. You can copy the OCI catalog content to a location where things will work properly so you can see a working example:

cp -r /root/.ibm-pak/data/publish/latest/catalog-oci/manifest-list /root/ibm-zcon-zosconnect-catalog

5. You'll need an ISC... I've included both the oci references in the example (the commented out one works, but the oci:///root/.ibm-pak/data/publish/latest/catalog-oci/manifest-list reference fails).

kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
mirror:
operators:
- catalog: oci:///root/.ibm-pak/data/publish/latest/catalog-oci/manifest-list
#- catalog: oci:///root/ibm-zcon-zosconnect-catalog
packages:
- name: ibm-zcon-zosconnect
channels:
- name: v1.0
full: true
targetTag: 27ba8e
targetCatalog: ibm-catalog
storageConfig:
local:
path: /root/.ibm-pak/oc-mirror-storage

6. run oc mirror (remember the ISC has oci refs for good and bad scenarios). You may want to change your working directory to different locations between running the good/bad examples.

oc mirror --config /root/.ibm-pak/data/publish/latest/image-set-config.yaml "file://zcon --dest-skip-tls --max-per-registry=6

Actual results:


Logging to .oc-mirror.log
Found: zcon/oc-mirror-workspace/src/publish
Found: zcon/oc-mirror-workspace/src/v2
Found: zcon/oc-mirror-workspace/src/charts
Found: zcon/oc-mirror-workspace/src/release-signatures
error: ".ibm-pak/data/publish/latest/catalog-oci/manifest-list/kubebuilder/kube-rbac-proxy@sha256:db06cc4c084dd0253134f156dddaaf53ef1c3fb3cc809e5d81711baa4029ea4c" is not a valid image reference: invalid reference format

Expected results:


Simple example where things were working with the oci:///root/ibm-zcon-zosconnect-catalog reference (this was executed in the same workspace so no new images were detected).

Logging to .oc-mirror.log
Found: zcon/oc-mirror-workspace/src/publish
Found: zcon/oc-mirror-workspace/src/v2
Found: zcon/oc-mirror-workspace/src/charts
Found: zcon/oc-mirror-workspace/src/release-signatures
3 related images processed in 668.063974ms
Writing image mapping to zcon/oc-mirror-workspace/operators.1700092336/manifests-ibm-zcon-zosconnect-catalog/mapping.txt
No new images detected, process stopping

Additional info:


I debugged the error that happened and captured one of the instances where the ParseReference call fails. This is only for reference to help narrow down the issue.

github.com/openshift/oc/pkg/cli/image/imagesource.ParseReference (/root/go/src/openshift/oc-mirror/vendor/github.com/openshift/oc/pkg/cli/image/imagesource/reference.go:111)
github.com/openshift/oc-mirror/pkg/image.ParseReference (/root/go/src/openshift/oc-mirror/pkg/image/image.go:79)
github.com/openshift/oc-mirror/pkg/cli/mirror.(*MirrorOptions).addRelatedImageToMapping (/root/go/src/openshift/oc-mirror/pkg/cli/mirror/fbc_operators.go:194)
github.com/openshift/oc-mirror/pkg/cli/mirror.(*OperatorOptions).plan.func3 (/root/go/src/openshift/oc-mirror/pkg/cli/mirror/operator.go:575)
golang.org/x/sync/errgroup.(*Group).Go.func1 (/root/go/src/openshift/oc-mirror/vendor/golang.org/x/sync/errgroup/errgroup.go:75)
runtime.goexit (/usr/local/go/src/runtime/asm_amd64.s:1594)

Also, I wanted to point out that because we use a period in the path (i.e. .ibm-pak) I wonder if that's causing the issue? This is just a guess and something to consider. *FOLLOWUP* ... I just removed the period from ".ibm-pak" and that seemed to make the error go away.

https://github.com/openshift/oc-mirror/pull/756

Bug OCPBUGS-21592: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/operator-framework/operator-marketplace/pull/546

Bug OCPBUGS-16607: [4.16] Number of clusters failing install on Ironic Inspection has increased with 502 proxy error in logs

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/ironic-agent-image/pull/100

Bug OCPBUGS-19244: Update 4.15 ose-azure-workload-identity-webhook image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/azure-workload-identity/pull/6

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/azure-workload-identity/pull/6

Bug OCPBUGS-22628: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-powervs-block-csi-driver-operator/pull/49

Bug OCPBUGS-9303: Install does not begin if secure boot was enabled for the first time

View the Description View the linked PRs

Description of problem:
If secure boot is currently disabled, and user attempts to enable it via ZTP, install will not begin the first time ZTP was triggered.

When secure boot is enabled viz ZTP, then boot options will be configured before virtual CD was attached, thus first boot will be booting into existing HD with secure boot on. Install will then get stuck because boot from CD was never triggered.

Version-Release number of selected component (if applicable):
4.10

How reproducible:
Always

Steps to Reproduce:
1. Secure boot is currently disabled in bios
2. Attempt to deploy a cluster with secure boot enabled via ZTP
3.

Actual results:

spoke cluster got booted with secure boot option toggled, into existing HD
spoke cluster did not boot into virtual CD, thus install never started.
agentclusterinstall gets stuck here:
State: insufficient
State Info: Cluster is not ready for install

Expected results:

installation started and completed successfully

Additional info:

Secure boot config used in ZTP siteconfig:
http://registry.kni-qe-0.lab.eng.rdu2.redhat.com:3000/kni-qe/ztp-site-configs/src/ff814164cdcd355ed980f1edf269dbc2afbe09aa/siteconfig/master-2.yaml#L40

Bug OCPBUGS-17380: IPSec enablement is broken on OVNK

View the Description View the linked PRs

Description of problem:

Enable IPSec pre/post install on OVN IC cluster

$ oc patch networks.operator.openshift.io cluster --type=merge -p '{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"ipsecConfig":{ }}}}}'
network.operator.openshift.io/cluster patched


ovn-ipsec containers complaining:

ovs-monitor-ipsec | ERR | Failed to import certificate into NSS.
b'certutil:  unable to open "/etc/openvswitch/keys/ipsec-cacert.pem" for reading (-5950, 2).\n'



$ oc rsh ovn-ipsec-d7rx9
Defaulted container "ovn-ipsec" out of: ovn-ipsec, ovn-keys (init)
sh-5.1# certutil -L -d /var/lib/ipsec/nss Certificate Nickname                                         Trust Attributes
                                                             SSL,S/MIME,JAR/XPIovs_certkey_db961f9a-7de4-4f1d-a2fb-a8306d4079c5             u,u,u 

sh-5.1# cat /var/log/openvswitch/libreswan.log
Aug  4 15:12:46.808394: Initializing NSS using read-write database "sql:/var/lib/ipsec/nss"
Aug  4 15:12:46.837350: FIPS Mode: NO
Aug  4 15:12:46.837370: NSS crypto library initialized
Aug  4 15:12:46.837387: FIPS mode disabled for pluto daemon
Aug  4 15:12:46.837390: FIPS HMAC integrity support [disabled]
Aug  4 15:12:46.837541: libcap-ng support [enabled]
Aug  4 15:12:46.837550: Linux audit support [enabled]
Aug  4 15:12:46.837576: Linux audit activated
Aug  4 15:12:46.837580: Starting Pluto (Libreswan Version 4.9 IKEv2 IKEv1 XFRM XFRMI esp-hw-offload FORK PTHREAD_SETSCHEDPRIO GCC_EXCEPTIONS NSS (IPsec profile) (NSS-KDF) DNSSEC SYSTEMD_WATCHDOG LABELED_IPSEC (SELINUX) SECCOMP LIBCAP_NG LINUX_AUDIT AUTH_PAM NETWORKMANAGER CURL(non-NSS) LDAP(non-NSS)) pid:147
Aug  4 15:12:46.837583: core dump dir: /run/pluto
Aug  4 15:12:46.837585: secrets file: /etc/ipsec.secrets
Aug  4 15:12:46.837587: leak-detective enabled
Aug  4 15:12:46.837589: NSS crypto [enabled]
Aug  4 15:12:46.837591: XAUTH PAM support [enabled]
Aug  4 15:12:46.837604: initializing libevent in pthreads mode: headers: 2.1.12-stable (2010c00); library: 2.1.12-stable (2010c00)
Aug  4 15:12:46.837664: NAT-Traversal support  [enabled]
Aug  4 15:12:46.837803: Encryption algorithms:
Aug  4 15:12:46.837814:   AES_CCM_16         {256,192,*128} IKEv1:     ESP     IKEv2:     ESP     FIPS              aes_ccm, aes_ccm_c
Aug  4 15:12:46.837820:   AES_CCM_12         {256,192,*128} IKEv1:     ESP     IKEv2:     ESP     FIPS              aes_ccm_b
Aug  4 15:12:46.837826:   AES_CCM_8          {256,192,*128} IKEv1:     ESP     IKEv2:     ESP     FIPS              aes_ccm_a
Aug  4 15:12:46.837831:   3DES_CBC           [*192]         IKEv1: IKE ESP     IKEv2: IKE ESP     FIPS NSS(CBC)     3des
Aug  4 15:12:46.837837:   CAMELLIA_CTR       {256,192,*128} IKEv1:     ESP     IKEv2:     ESP                      
Aug  4 15:12:46.837843:   CAMELLIA_CBC       {256,192,*128} IKEv1: IKE ESP     IKEv2: IKE ESP          NSS(CBC)     camellia
Aug  4 15:12:46.837849:   AES_GCM_16         {256,192,*128} IKEv1:     ESP     IKEv2: IKE ESP     FIPS NSS(GCM)     aes_gcm, aes_gcm_c
Aug  4 15:12:46.837855:   AES_GCM_12         {256,192,*128} IKEv1:     ESP     IKEv2: IKE ESP     FIPS NSS(GCM)     aes_gcm_b
Aug  4 15:12:46.837861:   AES_GCM_8          {256,192,*128} IKEv1:     ESP     IKEv2: IKE ESP     FIPS NSS(GCM)     aes_gcm_a
Aug  4 15:12:46.837867:   AES_CTR            {256,192,*128} IKEv1: IKE ESP     IKEv2: IKE ESP     FIPS NSS(CTR)     aesctr
Aug  4 15:12:46.837872:   AES_CBC            {256,192,*128} IKEv1: IKE ESP     IKEv2: IKE ESP     FIPS NSS(CBC)     aes
Aug  4 15:12:46.837878:   NULL_AUTH_AES_GMAC {256,192,*128} IKEv1:     ESP     IKEv2:     ESP     FIPS              aes_gmac
Aug  4 15:12:46.837883:   NULL               []             IKEv1:     ESP     IKEv2:     ESP                      
Aug  4 15:12:46.837889:   CHACHA20_POLY1305  [*256]         IKEv1:             IKEv2: IKE ESP          NSS(AEAD)    chacha20poly1305
Aug  4 15:12:46.837892: Hash algorithms:
Aug  4 15:12:46.837896:   MD5                               IKEv1: IKE         IKEv2:                  NSS         
Aug  4 15:12:46.837901:   SHA1                              IKEv1: IKE         IKEv2: IKE         FIPS NSS          sha
Aug  4 15:12:46.837906:   SHA2_256                          IKEv1: IKE         IKEv2: IKE         FIPS NSS          sha2, sha256
Aug  4 15:12:46.837910:   SHA2_384                          IKEv1: IKE         IKEv2: IKE         FIPS NSS          sha384
Aug  4 15:12:46.837915:   SHA2_512                          IKEv1: IKE         IKEv2: IKE         FIPS NSS          sha512
Aug  4 15:12:46.837919:   IDENTITY                          IKEv1:             IKEv2:             FIPS             
Aug  4 15:12:46.837922: PRF algorithms:
Aug  4 15:12:46.837927:   HMAC_MD5                          IKEv1: IKE         IKEv2: IKE              native(HMAC) md5
Aug  4 15:12:46.837931:   HMAC_SHA1                         IKEv1: IKE         IKEv2: IKE         FIPS NSS          sha, sha1
Aug  4 15:12:46.837936:   HMAC_SHA2_256                     IKEv1: IKE         IKEv2: IKE         FIPS NSS          sha2, sha256, sha2_256
Aug  4 15:12:46.837950:   HMAC_SHA2_384                     IKEv1: IKE         IKEv2: IKE         FIPS NSS          sha384, sha2_384
Aug  4 15:12:46.837955:   HMAC_SHA2_512                     IKEv1: IKE         IKEv2: IKE         FIPS NSS          sha512, sha2_512
Aug  4 15:12:46.837959:   AES_XCBC                          IKEv1:             IKEv2: IKE              native(XCBC) aes128_xcbc
Aug  4 15:12:46.837962: Integrity algorithms:
Aug  4 15:12:46.837966:   HMAC_MD5_96                       IKEv1: IKE ESP AH  IKEv2: IKE ESP AH       native(HMAC) md5, hmac_md5
Aug  4 15:12:46.837984:   HMAC_SHA1_96                      IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS NSS          sha, sha1, sha1_96, hmac_sha1
Aug  4 15:12:46.837995:   HMAC_SHA2_512_256                 IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS NSS          sha512, sha2_512, sha2_512_256, hmac_sha2_512
Aug  4 15:12:46.837999:   HMAC_SHA2_384_192                 IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS NSS          sha384, sha2_384, sha2_384_192, hmac_sha2_384
Aug  4 15:12:46.838005:   HMAC_SHA2_256_128                 IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS NSS          sha2, sha256, sha2_256, sha2_256_128, hmac_sha2_256
Aug  4 15:12:46.838008:   HMAC_SHA2_256_TRUNCBUG            IKEv1:     ESP AH  IKEv2:         AH                   
Aug  4 15:12:46.838014:   AES_XCBC_96                       IKEv1:     ESP AH  IKEv2: IKE ESP AH       native(XCBC) aes_xcbc, aes128_xcbc, aes128_xcbc_96
Aug  4 15:12:46.838018:   AES_CMAC_96                       IKEv1:     ESP AH  IKEv2:     ESP AH  FIPS              aes_cmac
Aug  4 15:12:46.838023:   NONE                              IKEv1:     ESP     IKEv2: IKE ESP     FIPS              null
Aug  4 15:12:46.838026: DH algorithms:
Aug  4 15:12:46.838031:   NONE                              IKEv1:             IKEv2: IKE ESP AH  FIPS NSS(MODP)    null, dh0
Aug  4 15:12:46.838035:   MODP1536                          IKEv1: IKE ESP AH  IKEv2: IKE ESP AH       NSS(MODP)    dh5
Aug  4 15:12:46.838039:   MODP2048                          IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS NSS(MODP)    dh14
Aug  4 15:12:46.838044:   MODP3072                          IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS NSS(MODP)    dh15
Aug  4 15:12:46.838048:   MODP4096                          IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS NSS(MODP)    dh16
Aug  4 15:12:46.838053:   MODP6144                          IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS NSS(MODP)    dh17
Aug  4 15:12:46.838057:   MODP8192                          IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS NSS(MODP)    dh18
Aug  4 15:12:46.838061:   DH19                              IKEv1: IKE         IKEv2: IKE ESP AH  FIPS NSS(ECP)     ecp_256, ecp256
Aug  4 15:12:46.838066:   DH20                              IKEv1: IKE         IKEv2: IKE ESP AH  FIPS NSS(ECP)     ecp_384, ecp384
Aug  4 15:12:46.838070:   DH21                              IKEv1: IKE         IKEv2: IKE ESP AH  FIPS NSS(ECP)     ecp_521, ecp521
Aug  4 15:12:46.838074:   DH31                              IKEv1: IKE         IKEv2: IKE ESP AH       NSS(ECP)     curve25519
Aug  4 15:12:46.838077: IPCOMP algorithms:
Aug  4 15:12:46.838081:   DEFLATE                           IKEv1:     ESP AH  IKEv2:     ESP AH  FIPS             
Aug  4 15:12:46.838085:   LZS                               IKEv1:             IKEv2:     ESP AH  FIPS             
Aug  4 15:12:46.838089:   LZJH                              IKEv1:             IKEv2:     ESP AH  FIPS             
Aug  4 15:12:46.838093: testing CAMELLIA_CBC:
Aug  4 15:12:46.838096:   Camellia: 16 bytes with 128-bit key
Aug  4 15:12:46.838162:   Camellia: 16 bytes with 128-bit key
Aug  4 15:12:46.838201:   Camellia: 16 bytes with 256-bit key
Aug  4 15:12:46.838243:   Camellia: 16 bytes with 256-bit key
Aug  4 15:12:46.838280: testing AES_GCM_16:
Aug  4 15:12:46.838284:   empty string
Aug  4 15:12:46.838319:   one block
Aug  4 15:12:46.838352:   two blocks
Aug  4 15:12:46.838385:   two blocks with associated data
Aug  4 15:12:46.838424: testing AES_CTR:
Aug  4 15:12:46.838428:   Encrypting 16 octets using AES-CTR with 128-bit key
Aug  4 15:12:46.838464:   Encrypting 32 octets using AES-CTR with 128-bit key
Aug  4 15:12:46.838502:   Encrypting 36 octets using AES-CTR with 128-bit key
Aug  4 15:12:46.838541:   Encrypting 16 octets using AES-CTR with 192-bit key
Aug  4 15:12:46.838576:   Encrypting 32 octets using AES-CTR with 192-bit key
Aug  4 15:12:46.838613:   Encrypting 36 octets using AES-CTR with 192-bit key
Aug  4 15:12:46.838651:   Encrypting 16 octets using AES-CTR with 256-bit key
Aug  4 15:12:46.838687:   Encrypting 32 octets using AES-CTR with 256-bit key
Aug  4 15:12:46.838724:   Encrypting 36 octets using AES-CTR with 256-bit key
Aug  4 15:12:46.838763: testing AES_CBC:
Aug  4 15:12:46.838766:   Encrypting 16 bytes (1 block) using AES-CBC with 128-bit key
Aug  4 15:12:46.838801:   Encrypting 32 bytes (2 blocks) using AES-CBC with 128-bit key
Aug  4 15:12:46.838841:   Encrypting 48 bytes (3 blocks) using AES-CBC with 128-bit key
Aug  4 15:12:46.838881:   Encrypting 64 bytes (4 blocks) using AES-CBC with 128-bit key
Aug  4 15:12:46.838928: testing AES_XCBC:
Aug  4 15:12:46.838932:   RFC 3566 Test Case 1: AES-XCBC-MAC-96 with 0-byte input
Aug  4 15:12:46.839126:   RFC 3566 Test Case 2: AES-XCBC-MAC-96 with 3-byte input
Aug  4 15:12:46.839291:   RFC 3566 Test Case 3: AES-XCBC-MAC-96 with 16-byte input
Aug  4 15:12:46.839444:   RFC 3566 Test Case 4: AES-XCBC-MAC-96 with 20-byte input
Aug  4 15:12:46.839600:   RFC 3566 Test Case 5: AES-XCBC-MAC-96 with 32-byte input
Aug  4 15:12:46.839756:   RFC 3566 Test Case 6: AES-XCBC-MAC-96 with 34-byte input
Aug  4 15:12:46.839937:   RFC 3566 Test Case 7: AES-XCBC-MAC-96 with 1000-byte input
Aug  4 15:12:46.840373:   RFC 4434 Test Case AES-XCBC-PRF-128 with 20-byte input (key length 16)
Aug  4 15:12:46.840529:   RFC 4434 Test Case AES-XCBC-PRF-128 with 20-byte input (key length 10)
Aug  4 15:12:46.840698:   RFC 4434 Test Case AES-XCBC-PRF-128 with 20-byte input (key length 18)
Aug  4 15:12:46.840990: testing HMAC_MD5:
Aug  4 15:12:46.840997:   RFC 2104: MD5_HMAC test 1
Aug  4 15:12:46.841200:   RFC 2104: MD5_HMAC test 2
Aug  4 15:12:46.841390:   RFC 2104: MD5_HMAC test 3
Aug  4 15:12:46.841582: testing HMAC_SHA1:
Aug  4 15:12:46.841585:   CAVP: IKEv2 key derivation with HMAC-SHA1
Aug  4 15:12:46.842055: 8 CPU cores online
Aug  4 15:12:46.842062: starting up 7 helper threads
Aug  4 15:12:46.842128: started thread for helper 0
Aug  4 15:12:46.842174: helper(1) seccomp security disabled for crypto helper 1
Aug  4 15:12:46.842188: started thread for helper 1
Aug  4 15:12:46.842219: helper(2) seccomp security disabled for crypto helper 2
Aug  4 15:12:46.842236: started thread for helper 2
Aug  4 15:12:46.842258: helper(3) seccomp security disabled for crypto helper 3
Aug  4 15:12:46.842269: started thread for helper 3
Aug  4 15:12:46.842296: helper(4) seccomp security disabled for crypto helper 4
Aug  4 15:12:46.842311: started thread for helper 4
Aug  4 15:12:46.842323: helper(5) seccomp security disabled for crypto helper 5
Aug  4 15:12:46.842346: started thread for helper 5
Aug  4 15:12:46.842369: helper(6) seccomp security disabled for crypto helper 6
Aug  4 15:12:46.842376: started thread for helper 6
Aug  4 15:12:46.842390: using Linux xfrm kernel support code on #1 SMP PREEMPT_DYNAMIC Thu Jul 20 09:11:28 EDT 2023
Aug  4 15:12:46.842393: helper(7) seccomp security disabled for crypto helper 7
Aug  4 15:12:46.842707: selinux support is NOT enabled.
Aug  4 15:12:46.842728: systemd watchdog not enabled - not sending watchdog keepalives
Aug  4 15:12:46.843813: seccomp security disabled
Aug  4 15:12:46.848083: listening for IKE messages
Aug  4 15:12:46.848252: Kernel supports NIC esp-hw-offload
Aug  4 15:12:46.848534: adding UDP interface ovn-k8s-mp0 10.129.0.2:500
Aug  4 15:12:46.848624: adding UDP interface ovn-k8s-mp0 10.129.0.2:4500
Aug  4 15:12:46.848654: adding UDP interface br-ex 169.254.169.2:500
Aug  4 15:12:46.848681: adding UDP interface br-ex 169.254.169.2:4500
Aug  4 15:12:46.848713: adding UDP interface br-ex 10.0.0.8:500
Aug  4 15:12:46.848740: adding UDP interface br-ex 10.0.0.8:4500
Aug  4 15:12:46.848767: adding UDP interface lo 127.0.0.1:500
Aug  4 15:12:46.848793: adding UDP interface lo 127.0.0.1:4500
Aug  4 15:12:46.848824: adding UDP interface lo [::1]:500
Aug  4 15:12:46.848853: adding UDP interface lo [::1]:4500
Aug  4 15:12:46.851160: loading secrets from "/etc/ipsec.secrets"
Aug  4 15:12:46.851214: no secrets filename matched "/etc/ipsec.d/*.secrets"
Aug  4 15:12:47.053369: loading secrets from "/etc/ipsec.secrets"

sh-4.4# tcpdump -i any esp
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes^C
0 packets capturedsh-5.1# ovn-nbctl --no-leader-only get nb_global . ipsec
false

Version-Release number of selected component (if applicable):

openshift/cluster-network-operator#1874

How reproducible:

Always

Steps to Reproduce:

1.Install OVN cluster and enable IPSec in runtime
2.
3.

Actual results:

no esp packets seen across the nodes

Expected results:

esp traffic should be seen across the nodes

Additional info:

https://github.com/openshift/cluster-network-operator/pull/1996

Bug OCPBUGS-19248: Update 4.15 telemeter image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/telemeter/pull/480

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/telemeter/pull/480

Bug OCPBUGS-24152: Update 4.15 ose-gcp-cloud-controller-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-gcp/pull/47

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-gcp/pull/47

Story MGMT-15860: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-installer-agent/pull/629

Bug OCPBUGS-20505: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-operator-controller/pull/28

Bug OCPBUGS-22357: Fix and bump library-go for storage operators

View the Description View the linked PRs

We need to fix and bump library-go for http2 vulnerability CVE-2023-44487. This effectively turns off HTTP/2 in library-go http endpoints, i.e. metrics and health.

Bug OCPBUGS-13664: There is no clear error log when create sts cluster with KMS key without install role in it

View the Description View the linked PRs

Description of problem:

There is no clear error log when create sts cluster with KMS key without install role in it

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1.Prepare KMS with aws command
   aws kms create-key --tags TagKey=Purpose,TagValue=Test --description "kms Key" 2.Create sts cluster with KMS key 

rosa create cluster --cluster-name ying-k1 --sts --role-arn arn:aws:iam::301721915996:role/ying16-Installer-Role --support-role-arn arn:aws:iam::301721915996:role/ying16-Support-Role --controlplane-iam-role arn:aws:iam::301721915996:role/ying16-ControlPlane-Role --worker-iam-role arn:aws:iam::301721915996:role/ying16-Worker-Role --operator-roles-prefix ying-k1-e2g3 --oidc-config-id 23ggvdh2jouranue87r5ujskp8hctisn --region us-west-2 --version 4.12.15 --replicas 2 --compute-machine-type m5.xlarge --machine-cidr 10.0.0.0/16 --service-cidr 172.30.0.0/16 --pod-cidr 10.128.0.0/14 --host-prefix 23 --kms-key-arn arn:aws:kms:us-west-2:301721915996:key/c60b5a31-1a5c-4d73-93ee-67586d0eb90d

Actual results:

It is failed. Here is the install log 
http://pastebin.test.redhat.com/1100008

Expected results:

There should be a detailed error message for the KMS that has no installer role

Additional info:

It can be successful if set install role arn to KMS key 
  {
    "Version": "2012-10-17",
    "Id": "key-default-1",
    "Statement": [
        {
            "Sid": "Enable IAM User Permissions",
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                   "arn:aws:iam::301721915996:role/ying16-Installer-Role",
                    "arn:aws:iam::301721915996:root"
                ]
            },
            "Action": "kms:*",
            "Resource": "*"
        }
    ]
}

Bug OCPBUGS-19364: Update 4.15 ose-multus-cni image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/multus-cni/pull/183

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/multus-cni/pull/183

Bug OCPBUGS-18248: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-ingress-operator/pull/975

Bug OCPBUGS-19014: private-router network policy breaks ignition access for 4.13.z OCP clusters

View the Description View the linked PRs

Description of problem:

In 4.13.z releases, the request-serving label is not present in the ignition-server-proxy deployment. The network policy in place prevents egress from the private router to pods that do not have the label, resulting in the ignition-server endpoint not being available from the outside.

Version-Release number of selected component (if applicable):

4.13.12 OCP, 4.14 HO

How reproducible:

Always

Steps to Reproduce:

1. Install latest HO
2. Create a HostedCluster with version 4.13.12
3. Wait for nodes to join

Actual results:

Nodes never join

Expected results:

Nodes join

Additional info:

Nodes are not joining because of the blocked egress from the router to the ignition-server-proxy

https://github.com/openshift/hypershift/pull/3012

Bug OCPBUGS-19188: Update 4.15 ose-ibm-vpc-block-csi-driver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ibm-vpc-block-csi-driver/pull/44

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ibm-vpc-block-csi-driver/pull/44

Bug OCPBUGS-22276: docker.io rate limiting triggering issues with okd jobs

View the Description View the linked PRs

Description of problem:

8.1478  tagged from docker.io/openshift/wildfly-81-centos7:latest479    prefer registry pullthrough when referencing this tag480481  Build and run WildFly 8.1 applications on CentOS 7. For more information about using this builder image, including OpenShift considerations, see https://github.com/openshift-s2i/s2i-wildfly/blob/master/README.md.482  Tags: builder, wildfly, java483  Supports: wildfly:8.1, jee, java484  Example Repo: https://github.com/openshift/openshift-jee-sample.git485486  ! error: Import failed (Unauthorized): you may not have access to the container image "docker.io/openshift/wildfly-81-centos7:latest"487      20 minutes ago488489490error: imported completed with errors491[Mon Oct 23 15:23:32 UTC 2023] Retrying image import openshift/wildfly:10.1492error: tag latest failed: you may not have access to the container image "docker.io/openshift/wildfly-101-centos7:latest"493imagestream.image.openshift.io/wildfly imported with errors494495Name:			wildfly496Namespace:		openshift497Created:		21 minutes ago

Version-Release number of selected component (if applicable):

4.14 / 4.15

How reproducible:

Often on vSphere jobs, perhaps because they lack a local mirror?

Steps to Reproduce:

1.
2.
3.

Actual results:

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/44127/rehearse-44127-periodic-ci-openshift-release-master-okd-scos-4.14-e2e-aws-ovn-serial/1716463869561409536

Expected results:

ci jobs run successfully

Additional info:

https://github.com/openshift/origin/pull/28347

Bug OCPBUGS-25322: did not find "trackTimestampsStaleness: true" setting for kubelet/kubelet-minimal servicemonitor

View the Description View the linked PRs

This is a clone of issue OCPBUGS-25025. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-monitoring-operator/pull/2201

Bug OCPBUGS-27092: Baremetal bootstrap logs no longer contain all services

View the Description View the linked PRs

Description of problem:

When bootstrap logs are collected (e.g. as part of a CI run when bootstrapping fails), it no longer contains most of the Ironic services. They used to be run in standalone pods, but after a recent refactoring, they are systemd services.

https://github.com/openshift/installer/pull/7854

Bug OCPBUGS-19267: Update 4.15 ose-azure-cloud-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-azure/pull/86

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-azure/pull/86

Bug OCPBUGS-19817: The traffic between worker node and external host got broken after delete ipsec-host pods

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

Is it an

internal CI failure
customer issue / SD
internal RedHat testing failure

If it is an internal RedHat testing failure:

Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

If it is a CI failure:

Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
If it's a connectivity issue,
What is the srcNode, srcIP and srcNamespace and srcPodName?
What is the dstNode, dstIP and dstNamespace and dstPodName?
What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

If it is a customer / SD issue:

Provide enough information in the bug description that Engineering doesn’t need to read the entire case history.
Don’t presume that Engineering has access to Salesforce.
Please provide must-gather and sos-report with an exact link to the comment in the support case with the attachment. The format should be: https://access.redhat.com/support/cases/#/case/<case number>/discussion?attachmentId=<attachment id>
Describe what each attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
- If the issue is in a customer namespace then provide a namespace inspect.
- If it is a connectivity issue:
  - What is the srcNode, srcNamespace, srcPodName and srcPodIP?
  - What is the dstNode, dstNamespace, dstPodName and dstPodIP?
  - What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
  - Please provide the UTC timestamp networking outage window from must-gather
  - Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
- If it is not a connectivity issue:
  - Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.

For OCPBUGS in which the issue has been identified, label with “sbr-triaged”
For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, labels with “sbr-untriaged”
Note: bugs that do not meet these minimum standards will be closed with label “SDN-Jira-template”

https://github.com/openshift/cluster-network-operator/pull/2087

Bug OCPBUGS-23062: Volume metrics test never passes

View the Description View the linked PRs

The following test is permafailing (see below for sippy link)

[sig-storage] [Serial] Volume metrics PVC should create metrics for total time taken in volume operations in P/V Controller [Suite:openshift/conformance/serial] [Suite:k8s]

Example failure

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-aws-sdn-serial/1722165323320266752

The test doesn't seem to always run in serial jobs, but whenever it does run, it fails. And it's often the only test that fails in the run. This only started a few days ago, around the 4th.

Additional context here:

https://sippy.dptools.openshift.org/sippy-ng/tests/4.15/analysis?test=%5Bsig-storage%5D%20%5BSerial%5D%20Volume%20metrics%20PVC%20should%20create%20metrics%20for%20total%20time%20taken%20in%20volume%20operations%20in%20P%2FV%20Controller%20%5BSuite%3Aopenshift%2Fconformance%2Fserial%5D%20%5BSuite%3Ak8s%5D&filters=%7B%22items%22%3A%5B%7B%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22equals%22%2C%22value%22%3A%22%5Bsig-storage%5D%20%5BSerial%5D%20Volume%20metrics%20PVC%20should%20create%20metrics%20for%20total%20time%20taken%20in%20volume%20operations%20in%20P%2FV%20Controller%20%5BSuite%3Aopenshift%2Fconformance%2Fserial%5D%20%5BSuite%3Ak8s%5D%22%7D%2C%7B%22columnField%22%3A%22variants%22%2C%22not%22%3Atrue%2C%22operatorValue%22%3A%22contains%22%2C%22value%22%3A%22never-stable%22%7D%2C%7B%22columnField%22%3A%22variants%22%2C%22not%22%3Atrue%2C%22operatorValue%22%3A%22contains%22%2C%22value%22%3A%22aggregated%22%7D%5D%2C%22linkOperator%22%3A%22and%22%7D

https://github.com/openshift/csi-external-provisioner/pull/78

Bug OCPBUGS-18569: CNO pod restart in hypershift CI

View the Description View the linked PRs

We are seeing flakes on CNO pod restarts flake in hypershift CI on the hypershift control plane

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_hypershift/2967/pull-ci-openshift-hypershift-main-e2e-kubevirt-aws-ovn/1699008879737704448/artifacts/e2e-kubevirt-aws-ovn/run-e2e-local/artifacts/TestCreateCluster/namespaces/e2e-clusters-pvhd5-example-s6skm/core/pods/logs/cluster-network-operator-78fd774c97-7w7dg-cluster-network-operator-previous.log

W0905 11:42:53.359515       1 builder.go:106] graceful termination failed, controllers failed with error: failed to get infrastructure name: infrastructureName not set in infrastructure 'cluster'

The current backoff is set to retry.DefaultBackoff which is appropriate for 409 conflicts and only retries for < 1s

var DefaultBackoff = wait.Backoff{
	Steps:    4,
	Duration: 10 * time.Millisecond,
	Factor:   5.0,
	Jitter:   0.1,
}

Elsewhere in the codebase, retry.DefaultBackoff is used with retry.RetryOnConflict() where it is appropriate, but we need to retry for much longer here and much less frequently.

https://github.com/openshift/cluster-network-operator/pull/1986

Bug OCPBUGS-23314: CLI outputs stack trace when creating a new cluster

View the Description View the linked PRs

Description of problem:

A stack trace is output when creating a hosted cluster via the hypershift CLI

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Run hypershift create cluster aws ... to create a hosted cluster

Actual results:

The output will contain:
[controller-runtime] log.SetLogger(...) was never called; logs will not be displayed.
Detected at:
	>  goroutine 1 [running]:
	>  runtime/debug.Stack()
	>  	/opt/homebrew/Cellar/go/1.21.4/libexec/src/runtime/debug/stack.go:24 +0x64
	>  sigs.k8s.io/controller-runtime/pkg/log.eventuallyFulfillRoot()
	>  	/Users/xinjiang/Codes/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/log/log.go:60 +0xa0
	>  sigs.k8s.io/controller-runtime/pkg/log.(*delegatingLogSink).WithName(0x14000845480, {0x10321d605, 0x14})
	>  	/Users/xinjiang/Codes/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/log/deleg.go:147 +0x34
	>  github.com/go-logr/logr.Logger.WithName({{0x10490a710, 0x14000845480}, 0x0}, {0x10321d605, 0x14})
	>  	/Users/xinjiang/Codes/hypershift/vendor/github.com/go-logr/logr/logr.go:336 +0x5c
	>  sigs.k8s.io/controller-runtime/pkg/client.newClient(0x1400097a900, {0x0, 0x140004a42a0, {0x0, 0x0}, 0x0, {0x0, 0x0}, 0x0})
	>  	/Users/xinjiang/Codes/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/client/client.go:122 +0xf8
	>  sigs.k8s.io/controller-runtime/pkg/client.New(0x14000ef98c0, {0x0, 0x140004a42a0, {0x0, 0x0}, 0x0, {0x0, 0x0}, 0x0})
	>  	/Users/xinjiang/Codes/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/client/client.go:103 +0x78
	>  github.com/openshift/hypershift/cmd/util.GetClient()
	>  	/Users/xinjiang/Codes/hypershift/cmd/util/client.go:50 +0x4f4
	>  github.com/openshift/hypershift/cmd/cluster/core.apply({0x104906d88, 0x140008dfb80}, {{0x10490acf8, 0x140009dd410}, 0x0}, 0x14000ef9200, 0x0, 0x0)
	>  	/Users/xinjiang/Codes/hypershift/cmd/cluster/core/create.go:324 +0xc8
	>  github.com/openshift/hypershift/cmd/cluster/core.CreateCluster({0x104906d88, 0x140008dfb80}, 0x1400056f600, 0x1048c4360)
	>  	/Users/xinjiang/Codes/hypershift/cmd/cluster/core/create.go:461 +0x264
	>  github.com/openshift/hypershift/cmd/cluster/aws.CreateCluster({0x104906d88, 0x140008dfb80}, 0x1400056f600)
	>  	/Users/xinjiang/Codes/hypershift/cmd/cluster/aws/create.go:79 +0x78
	>  github.com/openshift/hypershift/cmd/cluster/aws.NewCreateCommand.func1(0x14000d1ac00, {0x14000a6cf70, 0x0, 0xd})
	>  	/Users/xinjiang/Codes/hypershift/cmd/cluster/aws/create.go:65 +0x148
	>  github.com/spf13/cobra.(*Command).execute(0x14000d1ac00, {0x1400014c040, 0xd, 0xe})
	>  	/Users/xinjiang/Codes/hypershift/vendor/github.com/spf13/cobra/command.go:940 +0x90c
	>  github.com/spf13/cobra.(*Command).ExecuteC(0x14000c91800)
	>  	/Users/xinjiang/Codes/hypershift/vendor/github.com/spf13/cobra/command.go:1068 +0x770
	>  github.com/spf13/cobra.(*Command).Execute(0x14000c91800)
	>  	/Users/xinjiang/Codes/hypershift/vendor/github.com/spf13/cobra/command.go:992 +0x30
	>  github.com/spf13/cobra.(*Command).ExecuteContext(0x14000c91800, {0x104906d88, 0x140008dfb80})
	>  	/Users/xinjiang/Codes/hypershift/vendor/github.com/spf13/cobra/command.go:985 +0x70
	>  main.main()
	>  	/Users/xinjiang/Codes/hypershift/main.go:70 +0x46c
2023-11-15T18:24:26+08:00	INFO	Applied Kube resource	{"kind": "Namespace", "namespace": "", "name": "clusters"}

Expected results:

No stack trace is output

Additional info:

The function is not affected, the cluster still creates.

https://github.com/openshift/hypershift/pull/3199

Bug OCPBUGS-19235: Update 4.15 ose-cluster-bootstrap image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-bootstrap/pull/100

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-bootstrap/pull/100

Bug OCPBUGS-25698: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-external-provisioner/pull/86

Bug OCPBUGS-19230: Update 4.15 marketplace-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/operator-framework/operator-marketplace/pull/539

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/operator-framework/operator-marketplace/pull/539

Bug OCPBUGS-19444: AgentClusterInstall changes on load aren't respected

View the Description View the linked PRs

When we Load the AgentClusterInstall manifest from disk, we sometimes make changes to it.

e.g. after the fix for ~~OCPBUGS-7495~~ we rewrite any lowercase platform name to mixed case, because for a while we required lowercase even when mixed case is correct.

In 4.14, we set the userManagedNetworking to true when platform:none is used, even if the user didn't specify it in the ZTP manifests, because the controller in ZTP similarly defaults it.

However, these changes aren't taking effect, because they aren't passed through to the manifest that is included in the Agent ISO.

https://github.com/openshift/installer/pull/7506

Bug OCPBUGS-19666: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes/pull/1724

Bug OCPBUGS-20381: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oc/pull/1578

Bug OCPBUGS-21938: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-openstack/pull/235

Bug OCPBUGS-24213: kube-apiserver TLS artifacts should have ownership annotations

View the linked PRs

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1568

Bug OCPBUGS-18549: Significant 12 minute pod-to-host disruption detected on aws ovn minor upgrades

View the Description View the linked PRs

DISCLAIMER: The code for measuring disruption in-cluster is extremely new, we cannot be 100% confident what we're seeing is real, however the below bug is demonstrating a problem that is occurring in a very specific configuration, all others are unaffected, so this helps us gain some confidence what we're seeing is real.

https://grafana-loki.ci.openshift.org/d/ISnBj4LVk/disruption?orgId=1&var-platform=aws&var-percentile=P50&var-backend=pod-to-host-new-connections&var-releases=4.14&var-upgrade_type=minor&var-networks=sdn&var-networks=ovn&var-topologies=ha&var-architectures=amd64&var-min_job_runs=10&var-lookback=1&var-master_node_updated=Y&from=now-7d&to=now

affects pod-to-host-new-connections
affects aws minor upgrades are seeing over 14000s of disruption for the P50
does not affect pod-to-host-reused-connections
does not affect any other clouds
does not affect micro upgrades
does not affect pod-to-service or pod-to-pod backends
does not affect sdn

The total disruption comes from a number of pods which are added together, the actual duration of the disruption is roughly / 14. The actual disruption appears to be about 12 minutes and hits all pods doing pod-to-host monitoring simultaneously.

Sample job: (taken from expanding the "Most Recent Runs" panel in grafana)

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-multiarch-master-nightly-4.14-ocp-e2e-aws-ovn-heterogeneous-upgrade/1698740856976052224

In the first spyglass chart for upgrade you can see the batch of disruption: 7:28:19 - 7:40:03

We do not have data prior to ovn interconnect landing, so we cannot say if this started at that time or not.

https://github.com/openshift/ovn-kubernetes/pull/1907

Bug OCPBUGS-19135: Update 4.15 ose-cluster-kube-cluster-api-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-operator/pull/24

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-operator/pull/24

Bug OCPBUGS-26048: The default channel is not correct

View the Description View the linked PRs

Description of problem:

The default channel of 4.15, 4.16 clusters is stable-4.14.

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-2024-01-03-193825

How reproducible:

Always

Steps to Reproduce:

    1. Install a 4.16 cluster
    2. Check default channel
# oc adm upgrade 
warning: Cannot display available updates:
  Reason: VersionNotFound
  Message: Unable to retrieve available updates: currently reconciling cluster version 4.16.0-0.nightly-2024-01-03-193825 not found in the "stable-4.14" channel

Cluster version is 4.16.0-0.nightly-2024-01-03-193825

Upgradeable=False

  Reason: MissingUpgradeableAnnotation
  Message: Cluster operator cloud-credential should not be upgraded between minor versions: Upgradeable annotation cloudcredential.openshift.io/upgradeable-to on cloudcredential.operator.openshift.io/cluster object needs updating before upgrade. See Manually Creating IAM documentation for instructions on preparing a cluster for upgrade.

Upstream is unset, so the cluster will use an appropriate default.
Channel: stable-4.14

    3.

Actual results:

Default channel is stable-4.14 in a 4.16 cluster

Expected results:

Default channel should be stable-4.16 in a 4.16 cluster

Additional info:

4.15 cluster has the issue as well.

https://github.com/openshift/installer/pull/7867

Bug OCPBUGS-17286: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api/pull/181

Bug OCPBUGS-22075: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/whereabouts-cni/pull/206

Bug OCPBUGS-22629: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-powervs/pull/58

Bug OCPBUGS-15504: Remove kube-apiserver PrometheusRule from manifests

View the Description View the linked PRs

Description of problem:

0000_90_kube-apiserver-operator_04_servicemonitor-apiserver lists Prometheus Rule `kube-apiserver` which is meant to be deleted by CVO (has `release.openshift.io/delete: "true"` annotation).
This manifests is no longer needed, as `cluster:apiserver_current_inflight_requests:sum:max_over_time:2m` recording rule is already provided by other PrometheusRules.

If this is meant to be removed in 4.13, its safe to remove the manifest in 4.14, as we don't allow skipping 4.13 and by the time users will start 4.14 update this manifest would already be removed in the clusters by CVO

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1543

Task OU-286: Dev Console: Add data-test-id for dashboard panels so e2e test don't rely on panel names

View the Description View the linked PRs

Background

e2e console tests uses the panel titles as a way to assert the panels exist, titles can change so an id can be added so testing does not rely on titles

Outcomes

a `data-test-id` attribute is added to panels, the `data-test` containing the title remains unchanged.

https://github.com/openshift/console/pull/13340

Task HOSTEDCP-1184: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3008

Bug OCPBUGS-7893: TaskRun duration chart legend shows only 4 taskruns

View the Description View the linked PRs

Description of problem:
The TaskRun duration diagram on the "Metrics" tab of pipeline is set to only show 4 TaskRuns in the legend regardless of the number of TaskRuns on the diagram.

Expected results:

All TaskRuns should be displayed in the legend.

https://github.com/openshift/console/pull/13077

Bug OCPBUGS-19714: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13229

Bug OCPBUGS-20033: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3113

Bug OCPBUGS-24012: Tuned Node Profile takes up to 30 minutes post OpenShift Container Platform 4 - Node creation before it's being created

View the Description View the linked PRs

It was found that OpenShift Container Platform 4 - Node(s) are missing certain settings applied via tuned and when starting to investigate the problem it was found that it takes up to 30 minutes or more for the tuned profiles of this newly added OpenShift Container Platform 4 - Node for being created.

When increasing the log level of cluster-node-tuning-operator pod we can see the following events being recorded.

I1128 13:05:12.465193       1 controller.go:1121] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (add)
I1128 13:05:12.465235       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:05:12.465247       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:05:12.465255       1 profilecalculator.go:131] Node's new-worker-X.example.com providerID=aws:///eu-central-1c/i-0874090641dd61eef
I1128 13:05:12.465268       1 controller.go:300] sync(): Node new-worker-X.example.com label(s) changed
I1128 13:05:12.465288       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:05:12.486200       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:05:12.486233       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:05:12.486242       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:05:12.486256       1 controller.go:300] sync(): Node new-worker-X.example.com label(s) changed
I1128 13:05:12.486273       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:05:12.612063       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:05:12.612114       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:05:12.612127       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:05:12.612149       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:05:15.232435       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:05:15.232477       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:05:15.232541       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:05:15.232565       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:05:22.805108       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:05:22.805142       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:05:22.805151       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:05:22.805170       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:05:30.803481       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:05:30.803511       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:05:30.803519       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:05:30.803533       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:05:35.815894       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:05:35.815933       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:05:35.815942       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:05:35.815958       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:05:35.832338       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:05:35.832386       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:05:35.832395       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:05:35.832419       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:05:35.851291       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:05:35.851337       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:05:35.851349       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:05:35.851369       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:05:40.855159       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:05:40.855192       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:05:40.855201       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:05:40.855221       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:05:48.004741       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:05:48.004783       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:05:48.004815       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:05:48.004835       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:05:48.011986       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:05:48.012035       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:05:48.012047       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:05:48.012067       1 controller.go:300] sync(): Node new-worker-X.example.com label(s) changed
I1128 13:05:48.012090       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:05:53.475798       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:05:53.475842       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:05:53.475855       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:05:53.475876       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:05:56.097269       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:05:56.097299       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:05:56.097309       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:05:56.097329       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:05:58.497782       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:05:58.497838       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:05:58.497847       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:05:58.497864       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:06:06.117201       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:06:06.117235       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:06:06.117254       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:06:06.117271       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:06:08.008992       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:06:08.009031       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:06:08.009041       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:06:08.009059       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:06:09.685949       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:06:09.685988       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:06:09.685997       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:06:09.686015       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:06:11.163882       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:06:11.163929       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:06:11.163941       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:06:11.163965       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:06:19.730972       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:06:19.731005       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:06:19.731013       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:06:19.731028       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:06:23.713627       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:06:23.713665       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:06:23.713675       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:06:23.713693       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:07:52.133190       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:07:52.133227       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:07:52.133235       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:07:52.133268       1 controller.go:300] sync(): Node new-worker-X.example.com label(s) changed
I1128 13:07:52.133285       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:07:55.779247       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:07:55.779278       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:07:55.779286       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:07:55.779324       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:07:55.799941       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:07:55.799975       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:07:55.799983       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:07:55.800021       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:07:56.062048       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:07:56.062081       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:07:56.062089       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:07:56.062126       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:09:58.224261       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:09:58.224294       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:09:58.224303       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:09:58.224333       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:10:08.146467       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:10:08.146504       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:10:08.146513       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:10:08.146549       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:10:29.293368       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:10:29.293402       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:10:29.293410       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:10:29.293440       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:11:38.765691       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:11:38.781424       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:11:38.781432       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:11:38.781471       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:15:35.022263       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:15:35.022303       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:15:35.022312       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:15:35.022349       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:20:41.252897       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:20:41.252942       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:20:41.252951       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:20:41.252988       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:21:38.768157       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:21:38.781098       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:21:38.781103       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:21:38.781133       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:25:47.684402       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:25:47.684445       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:25:47.684457       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:25:47.684494       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:25:53.336668       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:25:53.336700       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:25:53.336709       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:25:53.336738       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:25:57.754420       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:25:57.754453       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:25:57.754462       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:25:57.754491       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:26:03.987123       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:26:03.987188       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:26:03.987203       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:26:03.987258       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:26:38.231524       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:26:38.231558       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:26:38.231566       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:26:38.231602       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:27:08.845310       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:27:08.845349       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:27:08.845358       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:27:08.845398       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:27:49.797881       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:27:49.797919       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:27:49.797928       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:27:49.797958       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:27:49.856526       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:27:49.856566       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:27:49.856575       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:27:49.856612       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:27:49.904286       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:27:49.904341       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:27:49.904350       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:27:49.904400       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:30:02.351363       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:30:02.351398       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:30:02.351407       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:30:02.351440       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:30:03.719303       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:30:03.719338       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:30:03.719347       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:30:03.719380       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:30:33.316267       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:30:33.316297       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:30:33.316307       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:30:33.316336       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:30:33.330998       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:30:33.331030       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:30:33.331038       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:30:33.331066       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:31:31.688121       1 controller.go:221] sync(): Kind profile: openshift-cluster-node-tuning-operator/new-worker-X.example.com
I1128 13:31:31.688136       1 controller.go:374] sync(): Profile new-worker-X.example.com
I1128 13:31:31.688300       1 profilecalculator.go:164] calculateProfile(new-worker-X.example.com)
I1128 13:31:31.688337       1 controller.go:677] syncProfile(): Profile new-worker-X.example.com not found, creating one [openshift-node]
I1128 13:31:31.688396       1 request.go:1073] Request Body: {"kind":"Profile","apiVersion":"tuned.openshift.io/v1","metadata":{"name":"new-worker-X.example.com","namespace":"openshift-cluster-node-tuning-operator","creationTimestamp":null,"ownerReferences":[{"apiVersion":"tuned.openshift.io/v1","kind":"Tuned","name":"default","uid":"324f82ad-4475-4b49-ac29-57cb454314e7","controller":true,"blockOwnerDeletion":true}]},"spec":{"config":{"tunedProfile":"openshift-node","debug":false,"tunedConfig":{"reapply_sysctl":null}}},"status":{"bootcmdline":"","tunedProfile":"","conditions":[{"type":"Applied","status":"Unknown","lastTransitionTime":"2023-11-28T13:31:31Z"},{"type":"Degraded","status":"Unknown","lastTransitionTime":"2023-11-28T13:31:31Z"}]}}
I1128 13:31:31.698807       1 request.go:1073] Response Body: {"apiVersion":"tuned.openshift.io/v1","kind":"Profile","metadata":{"creationTimestamp":"2023-11-28T13:31:31Z","generation":1,"managedFields":[{"apiVersion":"tuned.openshift.io/v1","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:ownerReferences":{".":{},"k:{\"uid\":\"324f82ad-4475-4b49-ac29-57cb454314e7\"}":{}}},"f:spec":{".":{},"f:config":{".":{},"f:debug":{},"f:tunedConfig":{},"f:tunedProfile":{}}}},"manager":"cluster-node-tuning-operator","operation":"Update","time":"2023-11-28T13:31:31Z"}],"name":"new-worker-X.example.com","namespace":"openshift-cluster-node-tuning-operator","ownerReferences":[{"apiVersion":"tuned.openshift.io/v1","blockOwnerDeletion":true,"controller":true,"kind":"Tuned","name":"default","uid":"324f82ad-4475-4b49-ac29-57cb454314e7"}],"resourceVersion":"9673729653","uid":"8607cf52-9a00-49d2-baff-8a97c73b809a"},"spec":{"config":{"debug":false,"tunedConfig":{},"tunedProfile":"openshift-node"}}}
I1128 13:31:31.698915       1 controller.go:687] created profile new-worker-X.example.com [openshift-node]
I1128 13:31:31.698925       1 controller.go:209] event from workqueue (profile/openshift-cluster-node-tuning-operator/new-worker-X.example.com) successfully processed
I1128 13:31:31.702309       1 controller.go:1121] add event to workqueue due to *v1.Profile, Namespace=openshift-cluster-node-tuning-operator, Name=new-worker-X.example.com (add)
I1128 13:31:31.702335       1 controller.go:221] sync(): Kind profile: openshift-cluster-node-tuning-operator/new-worker-X.example.com
I1128 13:31:31.702358       1 controller.go:374] sync(): Profile new-worker-X.example.com
I1128 13:31:31.702494       1 profilecalculator.go:164] calculateProfile(new-worker-X.example.com)
I1128 13:31:31.713444       1 controller.go:752] syncProfile(): updating Profile new-worker-X.example.com [openshift-node]
I1128 13:31:31.713543       1 request.go:1073] Request Body: {"kind":"Profile","apiVersion":"tuned.openshift.io/v1","metadata":{"name":"new-worker-X.example.com","namespace":"openshift-cluster-node-tuning-operator","uid":"8607cf52-9a00-49d2-baff-8a97c73b809a","resourceVersion":"9673729653","generation":1,"creationTimestamp":"2023-11-28T13:31:31Z","ownerReferences":[{"apiVersion":"tuned.openshift.io/v1","kind":"Tuned","name":"default","uid":"324f82ad-4475-4b49-ac29-57cb454314e7","controller":true,"blockOwnerDeletion":true}],"managedFields":[{"manager":"cluster-node-tuning-operator","operation":"Update","apiVersion":"tuned.openshift.io/v1","time":"2023-11-28T13:31:31Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:ownerReferences":{".":{},"k:{\"uid\":\"324f82ad-4475-4b49-ac29-57cb454314e7\"}":{}}},"f:spec":{".":{},"f:config":{".":{},"f:debug":{},"f:tunedConfig":{},"f:tunedProfile":{}}}}}]},"spec":{"config":{"tunedProfile":"openshift-node","debug":false,"tunedConfig":{"reapply_sysctl":null},"providerName":"aws"}},"status":{"bootcmdline":"","tunedProfile":"","conditions":[{"type":"Applied","status":"Unknown","lastTransitionTime":"2023-11-28T13:31:31Z"},{"type":"Degraded","status":"Unknown","lastTransitionTime":"2023-11-28T13:31:31Z"}]}}
I1128 13:31:31.713611       1 round_trippers.go:466] curl -v -XPUT  -H "User-Agent: cluster-node-tuning-operator/v0.0.0 (linux/amd64) kubernetes/$Format" -H "Accept: application/json, */*" -H "Authorization: Bearer <masked>" -H "Content-Type: application/json" 'https://172.16.0.1:443/apis/tuned.openshift.io/v1/namespaces/openshift-cluster-node-tuning-operator/profiles/new-worker-X.example.com'
I1128 13:31:31.720708       1 round_trippers.go:553] PUT https://172.16.0.1:443/apis/tuned.openshift.io/v1/namespaces/openshift-cluster-node-tuning-operator/profiles/new-worker-X.example.com 200 OK in 7 milliseconds
I1128 13:31:31.720855       1 request.go:1073] Response Body: {"apiVersion":"tuned.openshift.io/v1","kind":"Profile","metadata":{"creationTimestamp":"2023-11-28T13:31:31Z","generation":2,"managedFields":[{"apiVersion":"tuned.openshift.io/v1","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:ownerReferences":{".":{},"k:{\"uid\":\"324f82ad-4475-4b49-ac29-57cb454314e7\"}":{}}},"f:spec":{".":{},"f:config":{".":{},"f:debug":{},"f:providerName":{},"f:tunedConfig":{},"f:tunedProfile":{}}}},"manager":"cluster-node-tuning-operator","operation":"Update","time":"2023-11-28T13:31:31Z"}],"name":"new-worker-X.example.com","namespace":"openshift-cluster-node-tuning-operator","ownerReferences":[{"apiVersion":"tuned.openshift.io/v1","blockOwnerDeletion":true,"controller":true,"kind":"Tuned","name":"default","uid":"324f82ad-4475-4b49-ac29-57cb454314e7"}],"resourceVersion":"9673729659","uid":"8607cf52-9a00-49d2-baff-8a97c73b809a"},"spec":{"config":{"debug":false,"providerName":"aws","tunedConfig":{},"tunedProfile":"openshift-node"}}}
I1128 13:31:31.720946       1 controller.go:757] updated profile new-worker-X.example.com [openshift-node]
I1128 13:31:31.720955       1 controller.go:209] event from workqueue (profile/openshift-cluster-node-tuning-operator/new-worker-X.example.com) successfully processed
I1128 13:31:31.721160       1 controller.go:1136] add event to workqueue due to *v1.Profile, Namespace=openshift-cluster-node-tuning-operator, Name=new-worker-X.example.com (update)
I1128 13:31:31.724833       1 controller.go:221] sync(): Kind profile: openshift-cluster-node-tuning-operator/new-worker-X.example.com
I1128 13:31:31.724847       1 controller.go:374] sync(): Profile new-worker-X.example.com
I1128 13:31:31.724971       1 profilecalculator.go:164] calculateProfile(new-worker-X.example.com)
I1128 13:31:31.726987       1 controller.go:742] syncProfile(): no need to update Profile new-worker-X.example.com
I1128 13:31:31.726993       1 controller.go:209] event from workqueue (profile/openshift-cluster-node-tuning-operator/new-worker-X.example.com) successfully processed
I1128 13:31:32.273200       1 controller.go:1136] add event to workqueue due to *v1.Profile, Namespace=openshift-cluster-node-tuning-operator, Name=new-worker-X.example.com (update)
I1128 13:31:32.273234       1 controller.go:221] sync(): Kind profile: openshift-cluster-node-tuning-operator/new-worker-X.example.com
I1128 13:31:32.273246       1 controller.go:374] sync(): Profile new-worker-X.example.com
I1128 13:31:32.273410       1 profilecalculator.go:164] calculateProfile(new-worker-X.example.com)
I1128 13:31:32.284388       1 controller.go:742] syncProfile(): no need to update Profile new-worker-X.example.com
I1128 13:31:32.284400       1 controller.go:209] event from workqueue (profile/openshift-cluster-node-tuning-operator/new-worker-X.example.com) successfully processed
I1128 13:31:38.766803       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:31:38.769582       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:31:38.769588       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:31:38.769617       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:35:39.839137       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:35:39.839174       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:35:39.839182       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:35:39.839215       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed

So at 13:05:12 the OpenShift Container Platform 4 - Node called `new-worker-X.example.com` would indeed become available but it still took until 13:31:31 until the tuned profile was created and therefore required settings on the OpenShift Container Platform 4 - Node are being applied.

https://github.com/openshift/machine-config-operator/pull/4048

Bug OCPBUGS-23553: update packages in ironic-agent

View the Description View the linked PRs

update packages versions in ironic-agent container to bring in latest fixes

Bug OCPBUGS-18848: Update 4.15 ose-multus-route-override-cni image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/route-override-cni/pull/48

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/route-override-cni/pull/48

Bug OCPBUGS-22699: IPXE connection timed out

View the Description View the linked PRs

Description of problem:

New deployment of BM IPI using provisioning network with IPV6 is showing:

http://XXXX:XXXX:XXXX:XXXX::X:6180/images/ironic-python-agernt.kernel....
connection timed out (http://ipxe.org/4c0a6092)" error

Version-Release number of selected component (if applicable):

Openshift 4.12.32
Also seen in Openshift 4.14.0-rc.5 when adding new nodes

How reproducible:

Very frequent

Steps to Reproduce:

1. Deploy cluster using BM with provided config
2.
3.

Actual results:

Consistent failures depending of the version of OCP used to deploy

Expected results:

No error, successful deployment

Additional info:

Things checked while the bootstrap host is active and the installation information is still valid (and failing):
- tried downloading the "ironic-python-agent.kernel" file from different places (bootstrap, bastion hosts, another provisioned host) and in all cases it worked:
[core@control-1-ru2 ~]$ curl -6 -v -o ironic-python-agent.kernel http://[XXXX:XXXX:XXXX:XXXX::X]:80/images/ironic-python-agent.kernel
\*   Trying XXXX:XXXX:XXXX:XXXX::X...
\* TCP_NODELAY set
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                               Dload  Upload   Total   Spent    Left  Speed
0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Connected to XXXX:XXXX:XXXX:XXXX::X (xxxx:xxxx:xxxx:xxxx::x) port 80   #0)
> GET /images/ironic-python-agent.kernel HTTP/1.1
> Host: [xxxx:xxxx:xxxx:xxxx::x]
> User-Agent: curl/7.61.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Fri, 27 Oct 2023 08:28:09 GMT
< Server: Apache
< Last-Modified: Thu, 26 Oct 2023 08:42:16 GMT
< ETag: "a29d70-6089a8c91c494"
< Accept-Ranges: bytes
< Content-Length: 10657136
<
{ [14084 bytes data]
100 10.1M  100 10.1M    0     0   597M      0 --:--:-- --:--:-- --:--:--  597M
\* Connection #0 to host xxxx:xxxx:xxxx:xxxx::x left intact

This verifies some of the components like the network setup and the httpd service running on ironic pods.

- Also gathered listing of the contents of the ironic pod running in podman, specially in the shared directory. The contents of /shared/html/inspector.ipxe seems correct compared to a working installation, also all files look in place.

- Logs from the ironic container shows the errors coming from the node being deployed, we also show here the curl log to compare:

xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx - - [27/Oct/2023:08:19:55 +0000] "GET /images/ironic-python-agent.kernel HTTP/1.1" 400 226 "-" "iPXE/1.0.0+ (4bd064de)"
xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx - - [27/Oct/2023:08:19:55 +0000] "GET /images/ironic-python-agent.kernel HTTP/1.1" 400 226 "-" "iPXE/1.0.0+ (4bd064de)"
xxxx:xxxx:xxxx:xxxx::x - - [27/Oct/2023:08:20:23 +0000] "GET /images/ironic-python-agent.kernel HTTP/1.1" 200 10657136 "-" "curl/7.61.1"
cxxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx - - [27/Oct/2023:08:20:23 +0000] "GET /images/ironic-python-agent.kernel HTTP/1.1" 400 226 "-" "iPXE/1.0.0+ (4bd064de)"
xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx - - [27/Oct/2023:08:20:23 +0000] "GET /images/ironic-python-agent.kernel HTTP/1.1" 400 226 "-" "iPXE/1.0.0+ (4bd064de)"

Seems like an issue with iPXE and IPV6

https://github.com/openshift/ironic-image/pull/446

Bug OCPBUGS-24310: Update 4.15 ose-csi-driver-shared-resource-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-driver-shared-resource/pull/157

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-driver-shared-resource/pull/157

Task MON-3383: Remove weak MD5 cryptograhic primitive usage to complay with newer CWE

View the Description View the linked PRs

The security team will soon start having the code owners address also CWE (Common Weakness Enumeration). Although this is not a CVE per se it may have security ramifications.

This issue addresses weak MD5 primitive usages in CMO.

https://cwe.mitre.org/data/definitions/1240.html

https://github.com/openshift/cluster-monitoring-operator/pull/2086

Bug OCPBUGS-18832: shortname for FAR Template not correct in console resource badge

View the Description View the linked PRs

Description of problem:

console does not enable customizing the abbreviation that appears on the resource icon badge. This causes an issue for the FAR operator with the CRD FenceAgentRemediationTemplate, the badge icon shows FART. The CRD includes a custom short name, but the console ignores it

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1. create the CRD (included link to github)
2. navigate to Home -> search
3. Enter far into the Resources filter

Actual results:

The badge FART shows in the dropdown

Expected results:

The badge should show fartemplate - the content of the short name

Additional info:

https://github.com/openshift/console/pull/13162

Bug OCPBUGS-19148: Update 4.15 ose-openstack-cloud-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-openstack/pull/215

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-openstack/pull/215

Story TRT-1329: Port Kube event intervals and pathological framework to structured

View the Description View the linked PRs

This is not going to be pretty. Likely mostly a re-implementation given the way everything was coded to use regexes that depend on the old locator and keys in specific orders. We need a new way to define matchers that uses structured intervals.

We also have some very complex logic around hashing the message to get it into the locator. Possible duplication between watchevents/event.go and duplicated_events.go.

Will be quite delicate and probably very time consuming.

https://github.com/openshift/origin/pull/28399

Bug OCPBUGS-6515: [CI Watcher] ConsoleExternalLogLink CRD creates, displays, modifies, and deletes a new ConsoleExternalLogLink - CypressError: Timed out

View the Description View the linked PRs

Description of problem:

ConsoleExternalLogLink CRD.ConsoleExternalLogLink CRD creates, displays, modifies, and deletes a new ConsoleExternalLogLink instance
AssertionError: Timed out retrying after 30000ms: Expected to find element: `[data-test-id=test-nubya-cell]`, but never found it.

https://search.ci.openshift.org/?search=creates%2C+displays%2C+modifies%2C+and+deletes+a+new+ConsoleExternalLogLink+instance&maxAge=168h&context=1&type=junit&name=pull-ci-openshift-console-master-e2e-gcp-console&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

https://github.com/openshift/console/pull/13110

Bug OCPBUGS-19166: Update 4.15 csi-driver-manila-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-driver-manila-operator/pull/204

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-22869: OVN secondary network annotation timeout in hosted pod using Kubevirt provider

View the Description View the linked PRs

Description of problem:

A net-attach-def using "type: ovn-k8s-cni-overlay, topology:layer2"
does not work in a hosted pod when using the Kubevirt provider.

Note: As a general hosted multus sanity check, using a "type: bridge" NAD does work properly in a hosted pod and both interfaces start as expected:
  Normal  AddedInterface  86s   multus             Add eth0 [10.133.0.21/23] from ovn-kubernetes
  Normal  AddedInterface  86s   multus             Add net1 [192.0.2.193/27] from default/bridge-net

Version-Release number of selected component (if applicable):

OCP 4.14.1
CNV 4.14.0-2385

How reproducible:

Reproduced w/ multiple attempts when using OVN secondary network

Steps to Reproduce:

1. Create the NAD on the hosted Kubevirt cluster:

apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: l2-network
spec:
  config: |-
    {
      "cniVersion": "0.3.1",
      "name": "l2-network",
      "type": "ovn-k8s-cni-overlay",
      "topology":"layer2",
      "netAttachDefName": "default/l2-network"
    }

2. Create a hosted pod w/ that net annotation:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.v1.cni.cncf.io/networks:  '[
      {
        "name": "l2-network",
        "interface": "net1",
        "ips": [
          "192.0.2.22/24"
          ]
      }
    ]'
  name: debug-ovnl2-c
  namespace: default
spec:
  securityContext:
    seccompProfile:
      type: RuntimeDefault
    runAsNonRoot: true
    runAsUser: 1000
  containers:
  - name: debug-ovnl2-c
    command:
    - /usr/bin/bash
    - -x
    - -c
    - |
      sleep infinity
    image: quay.io/cloud-bulldozer/uperf:latest
    imagePullPolicy: Always
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
  nodeSelector:
    kubernetes.io/hostname: kv1-a8a5d7f1-9xwm4

3. Pod remains in ContainerCreating because it cannot create the net1 iface, pod describe event logs:

Events:
  Type     Reason                  Age    From               Message
  ----     ------                  ----   ----               -------
  Normal   Scheduled               4m21s  default-scheduler  Successfully assigned default/debug-ovnl2-c to kv1-a8a5d7f1-9xwm4
  Warning  FailedCreatePodSandBox  2m20s  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_debug-ovnl2-c_default_1b42bc5a-1148-49d8-a2d0-7689a46f59ea_0(1e2d9008074c3c5af5ccbb2e7e2e7ca2466395b642a1677db2dfadd35eb84b73): error adding pod default_debug-ovnl2-c to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: '&{ContainerID:1e2d9008074c3c5af5ccbb2e7e2e7ca2466395b642a1677db2dfadd35eb84b73 Netns:/var/run/netns/5da048e3-b534-481d-acc6-2ddc6a439586 IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=debug-ovnl2-c;K8S_POD_INFRA_CONTAINER_ID=1e2d9008074c3c5af5ccbb2e7e2e7ca2466395b642a1677db2dfadd35eb84b73;K8S_POD_UID=1b42bc5a-1148-49d8-a2d0-7689a46f59ea Path: StdinData:[123 34 98 105 110 68 105 114 34 58 34 47 118 97 114 47 108 105 98 47 99 110 105 47 98 105 110 34 44 34 99 104 114 111 111 116 68 105 114 34 58 34 47 104 111 115 116 114 111 111 116 34 44 34 99 108 117 115 116 101 114 78 101 116 119 111 114 107 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 99 110 105 47 110 101 116 46 100 47 49 48 45 111 118 110 45 107 117 98 101 114 110 101 116 101 115 46 99 111 110 102 34 44 34 99 110 105 67 111 110 102 105 103 68 105 114 34 58 34 47 104 111 115 116 47 101 116 99 47 99 110 105 47 110 101 116 46 100 34 44 34 99 110 105 86 101 114 115 105 111 110 34 58 34 48 46 51 46 49 34 44 34 100 97 101 109 111 110 83 111 99 107 101 116 68 105 114 34 58 34 47 114 117 110 47 109 117 108 116 117 115 47 115 111 99 107 101 116 34 44 34 103 108 111 98 97 108 78 97 109 101 115 112 97 99 101 115 34 58 34 100 101 102 97 117 108 116 44 111 112 101 110 115 104 105 102 116 45 109 117 108 116 117 115 44 111 112 101 110 115 104 105 102 116 45 115 114 105 111 118 45 110 101 116 119 111 114 107 45 111 112 101 114 97 116 111 114 34 44 34 108 111 103 76 101 118 101 108 34 58 34 118 101 114 98 111 115 101 34 44 34 108 111 103 84 111 83 116 100 101 114 114 34 58 116 114 117 101 44 34 109 117 108 116 117 115 65 117 116 111 99 111 110 102 105 103 68 105 114 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 99 110 105 47 110 101 116 46 100 34 44 34 109 117 108 116 117 115 67 111 110 102 105 103 70 105 108 101 34 58 34 97 117 116 111 34 44 34 110 97 109 101 34 58 34 109 117 108 116 117 115 45 99 110 105 45 110 101 116 119 111 114 107 34 44 34 110 97 109 101 115 112 97 99 101 73 115 111 108 97 116 105 111 110 34 58 116 114 117 101 44 34 112 101 114 78 111 100 101 67 101 114 116 105 102 105 99 97 116 101 34 58 123 34 98 111 111 116 115 116 114 97 112 75 117 98 101 99 111 110 102 105 103 34 58 34 47 118 97 114 47 108 105 98 47 107 117 98 101 108 101 116 47 107 117 98 101 99 111 110 102 105 103 34 44 34 99 101 114 116 68 105 114 34 58 34 47 101 116 99 47 99 110 105 47 109 117 108 116 117 115 47 99 101 114 116 115 34 44 34 99 101 114 116 68 117 114 97 116 105 111 110 34 58 34 50 52 104 34 44 34 101 110 97 98 108 101 100 34 58 116 114 117 101 125 44 34 115 111 99 107 101 116 68 105 114 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 115 111 99 107 101 116 34 44 34 116 121 112 101 34 58 34 109 117 108 116 117 115 45 115 104 105 109 34 125]} ContainerID:"1e2d9008074c3c5af5ccbb2e7e2e7ca2466395b642a1677db2dfadd35eb84b73" Netns:"/var/run/netns/5da048e3-b534-481d-acc6-2ddc6a439586" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=debug-ovnl2-c;K8S_POD_INFRA_CONTAINER_ID=1e2d9008074c3c5af5ccbb2e7e2e7ca2466395b642a1677db2dfadd35eb84b73;K8S_POD_UID=1b42bc5a-1148-49d8-a2d0-7689a46f59ea" Path:"" ERRORED: error configuring pod [default/debug-ovnl2-c] networking: [default/debug-ovnl2-c/1b42bc5a-1148-49d8-a2d0-7689a46f59ea:l2-network]: error adding container to network "l2-network": CNI request failed with status 400: '[default/debug-ovnl2-c 1e2d9008074c3c5af5ccbb2e7e2e7ca2466395b642a1677db2dfadd35eb84b73 network l2-network NAD default/l2-network] [default/debug-ovnl2-c 1e2d9008074c3c5af5ccbb2e7e2e7ca2466395b642a1677db2dfadd35eb84b73 network l2-network NAD default/l2-network] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded
'
'
  Warning  FailedCreatePodSandBox  19s  kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_debug-ovnl2-c_default_1b42bc5a-1148-49d8-a2d0-7689a46f59ea_0(48110f0ecc0979992108e4441ff06f50c0d90f527cbe0b8fe1ca18d5398b67eb): error adding pod default_debug-ovnl2-c to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: '&{ContainerID:48110f0ecc0979992108e4441ff06f50c0d90f527cbe0b8fe1ca18d5398b67eb Netns:/var/run/netns/cae8fab7-80c2-40b7-b1a7-49c8fc8732b2 IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=debug-ovnl2-c;K8S_POD_INFRA_CONTAINER_ID=48110f0ecc0979992108e4441ff06f50c0d90f527cbe0b8fe1ca18d5398b67eb;K8S_POD_UID=1b42bc5a-1148-49d8-a2d0-7689a46f59ea Path: StdinData:[123 34 98 105 110 68 105 114 34 58 34 47 118 97 114 47 108 105 98 47 99 110 105 47 98 105 110 34 44 34 99 104 114 111 111 116 68 105 114 34 58 34 47 104 111 115 116 114 111 111 116 34 44 34 99 108 117 115 116 101 114 78 101 116 119 111 114 107 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 99 110 105 47 110 101 116 46 100 47 49 48 45 111 118 110 45 107 117 98 101 114 110 101 116 101 115 46 99 111 110 102 34 44 34 99 110 105 67 111 110 102 105 103 68 105 114 34 58 34 47 104 111 115 116 47 101 116 99 47 99 110 105 47 110 101 116 46 100 34 44 34 99 110 105 86 101 114 115 105 111 110 34 58 34 48 46 51 46 49 34 44 34 100 97 101 109 111 110 83 111 99 107 101 116 68 105 114 34 58 34 47 114 117 110 47 109 117 108 116 117 115 47 115 111 99 107 101 116 34 44 34 103 108 111 98 97 108 78 97 109 101 115 112 97 99 101 115 34 58 34 100 101 102 97 117 108 116 44 111 112 101 110 115 104 105 102 116 45 109 117 108 116 117 115 44 111 112 101 110 115 104 105 102 116 45 115 114 105 111 118 45 110 101 116 119 111 114 107 45 111 112 101 114 97 116 111 114 34 44 34 108 111 103 76 101 118 101 108 34 58 34 118 101 114 98 111 115 101 34 44 34 108 111 103 84 111 83 116 100 101 114 114 34 58 116 114 117 101 44 34 109 117 108 116 117 115 65 117 116 111 99 111 110 102 105 103 68 105 114 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 99 110 105 47 110 101 116 46 100 34 44 34 109 117 108 116 117 115 67 111 110 102 105 103 70 105 108 101 34 58 34 97 117 116 111 34 44 34 110 97 109 101 34 58 34 109 117 108 116 117 115 45 99 110 105 45 110 101 116 119 111 114 107 34 44 34 110 97 109 101 115 112 97 99 101 73 115 111 108 97 116 105 111 110 34 58 116 114 117 101 44 34 112 101 114 78 111 100 101 67 101 114 116 105 102 105 99 97 116 101 34 58 123 34 98 111 111 116 115 116 114 97 112 75 117 98 101 99 111 110 102 105 103 34 58 34 47 118 97 114 47 108 105 98 47 107 117 98 101 108 101 116 47 107 117 98 101 99 111 110 102 105 103 34 44 34 99 101 114 116 68 105 114 34 58 34 47 101 116 99 47 99 110 105 47 109 117 108 116 117 115 47 99 101 114 116 115 34 44 34 99 101 114 116 68 117 114 97 116 105 111 110 34 58 34 50 52 104 34 44 34 101 110 97 98 108 101 100 34 58 116 114 117 101 125 44 34 115 111 99 107 101 116 68 105 114 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 115 111 99 107 101 116 34 44 34 116 121 112 101 34 58 34 109 117 108 116 117 115 45 115 104 105 109 34 125]} ContainerID:"48110f0ecc0979992108e4441ff06f50c0d90f527cbe0b8fe1ca18d5398b67eb" Netns:"/var/run/netns/cae8fab7-80c2-40b7-b1a7-49c8fc8732b2" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=debug-ovnl2-c;K8S_POD_INFRA_CONTAINER_ID=48110f0ecc0979992108e4441ff06f50c0d90f527cbe0b8fe1ca18d5398b67eb;K8S_POD_UID=1b42bc5a-1148-49d8-a2d0-7689
a46f59ea" Path:"" ERRORED: error configuring pod [default/debug-ovnl2-c] networking: [default/debug-ovnl2-c/1b42bc5a-1148-49d8-a2d0-7689a46f59ea:l2-network]: error adding container to network "l2-network": CNI request failed with status 400: '[default/debug-ovnl2-c 48110f0ecc0979992108e4441ff06f50c0d90f527cbe0b8fe1ca18d5398b67eb network l2-network NAD default/l2-network] [default/debug-ovnl2-c 48110f0ecc0979992108e4441ff06f50c0d90f527cbe0b8fe1ca18d5398b67eb network l2-network NAD default/l2-network] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded
'
'
  Normal  AddedInterface  18s (x3 over 4m20s)  multus  Add eth0 [10.133.0.21/23] from ovn-kubernetes

Actual results:

Pod cannot start

Expected results:

Pod can start with additional "ovn-k8s-cni-overlay" network

Additional info:

Slack thread: https://redhat-internal.slack.com/archives/C02UVQRJG83/p1698857051578159
I did confirm the same NAD and pod definition start fine on the management cluster.

https://github.com/openshift/cluster-network-operator/pull/2113

Bug OCPBUGS-24136: Update 4.15 vmware-vsphere-syncer-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/vmware-vsphere-csi-driver/pull/99

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/vmware-vsphere-csi-driver/pull/99

Bug OCPBUGS-23977: After Patternfly5 Update: Knative Service Name Bar not visible in Topology view

View the Description View the linked PRs

After Patternfly5 Update: Knative Service Name Bar not visible in Topology view

Refer this:

https://drive.google.com/file/d/1_KAotzs4WC8g2oW0OymTA_cGm-xabXlq/view?usp=sharing

https://github.com/openshift/console/pull/13376

Story TRT-1374: Hypershift broken in CI payloads on failure to update cloud-controller-manager-operator service

View the Description View the linked PRs

CVO reporting:

Could not update service "openshift-cloud-controller-manager-operator/cloud-controller-manager-operator"
(111 of 613): resource may have been deleted

Reported by hypershift team who were first to notice: https://redhat-internal.slack.com/archives/C01CQA76KMX/p1701279683347589

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-hypershift-release-4.15-periodics-e2e-aws-ovn/1729892601563189248

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-hypershift-release-4.15-periodics-e2e-aws-ovn/1729892601563189248/artifacts/e2e-aws-ovn/run-e2e/artifacts/TestCreateCluster/hostedcluster-example-wtq8t/cluster-scoped-resources/config.openshift.io/clusterversions.yaml

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/303

Bug OCPBUGS-24071: Update 4.15 ose-csi-snapshot-controller-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-snapshotter/pull/115

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-snapshotter/pull/115

Bug OCPBUGS-10191: Update 4.14 csi-driver-manila image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-openstack/pull/188

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-openstack/pull/188

Bug OCPBUGS-18853: Update 4.15 openshift-enterprise-base image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/images/pull/148

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/images/pull/148

Bug OCPBUGS-19253: Update 4.15 ose-cluster-kube-storage-version-migrator-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-kube-storage-version-migrator-operator/pull/94

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-kube-storage-version-migrator-operator/pull/94

Bug OCPBUGS-20049: Agent-based install on vSphere with multiple workers fails

View the Description View the linked PRs

Description of problem:

Agent-based install on vSphere with multiple workers fails

Version-Release number of selected component (if applicable):

4.13.4

How reproducible:

Always

Steps to Reproduce:

1. Create agent-config, install-config for 3 master, 3+ worker cluster
2. Create Agent ISO image
3. Boot targets from Agent ISO

Actual results:

Deployment hangs waiting on cluster operators

Expected results:

Deployment completes

Additional info:

Multiple pods cannot start due to tainted nodes:"4 node(s) had untolerated taint {node.cloudprovider.kubernetes.io/uninitialized: true}"

https://github.com/openshift/assisted-installer/pull/739

Bug OCPBUGS-27186: CNO IPsec API

View the Description View the linked PRs

This bug is to track the work needed to merge the CNO IPsec API backports

https://github.com/openshift/api/pull/1667

https://github.com/openshift/cluster-network-operator/pull/2200

Bug OCPBUGS-21649: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-aws/pull/479

Bug OCPBUGS-23741: Bump cluster-dns-operator to Kubernetes 1.28 for 4.15

View the Description View the linked PRs

Description of problem

The cluster-dns-operator repository vendors k8s.io/* v0.27.2 and controller-runtime v0.15.0. OpenShift 4.15 is based on Kubernetes 1.28.

Version-Release number of selected component (if applicable)

4.15.

How reproducible

Always.

Steps to Reproduce

Check https://github.com/openshift/cluster-dns-operator/blob/release-4.15/go.mod.

Actual results

The k8s.io/* packages are at v0.27.2, and the sigs.k8s.io/controller-runtime package is at v0.15.0.

Expected results

The k8s.io/* packages are at v0.28.0 or newer, and the sigs.k8s.io/controller-runtime package is at v0.16.0 or newer.

Additional info

The controller-runtime v0.16 release includes some breaking changes; see the release notes at https://github.com/kubernetes-sigs/controller-runtime/releases/tag/v0.16.0.

https://github.com/openshift/cluster-dns-operator/pull/395

Bug MGMT-16335: vSphere disk UUID property not reported properly

View the Description View the linked PRs

Description of the problem:

We have a validation on vSphere that ensures the disk UUID property is set. However, the agent reports a fake disk in appliance mode, with the "hasUUID" property always set to false.

How reproducible:

100%

Steps to reproduce:

1. Try to install on vSphere

Actual results:

The UUID validation always fails

Expected results:

The UUID validation passes if the UUID property is set on the VM

https://github.com/openshift/assisted-installer-agent/pull/634

Story TRT-1361: Hypershift failures blocking CI payloads

View the Description View the linked PRs

Two payloads in a row, first had more failures, second had less but still broken.

Both exhibit this status on the console operator:

  status:
    conditions:
    - lastTransitionTime: "2023-11-17T06:06:57Z"
      message: 'OAuthClientSyncDegraded: the server is currently unable to handle
        the request (get oauthclients.oauth.openshift.io console)'
      reason: OAuthClientSync_FailedRegister
      status: "True"
      type: Degraded

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-hypershift-release-4.15-periodics-e2e-aws-ovn/1725383840110743552/artifacts/e2e-aws-ovn/run-e2e/artifacts/TestUpgradeControlPlane/hostedcluster-example-dcxq4/cluster-scoped-resources/config.openshift.io/clusteroperators.yaml

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-hypershift-release-4.15-periodics-e2e-aws-ovn/1725305112194191360/artifacts/e2e-aws-ovn/run-e2e/artifacts/TestUpgradeControlPlane/hostedcluster-example-c7bz4/cluster-scoped-resources/config.openshift.io/clusteroperators.yaml

We are suspicious of this PR, however this change was before the payloads started failing, perhaps the issue only surfaces on upgrades once the change was in an accepted payload: https://github.com/openshift/console-operator/pull/808

There is also a hypershift PR that was only present in second failed payload, possibly a reaction to the problem but didn't fully fix? There were less failures in the second payload than the first: https://github.com/openshift/hypershift/pull/3151 ? If so, this will complicate a revert.

Discussion: https://redhat-internal.slack.com/archives/C01C8502FMM/p1700226091335339

https://github.com/openshift/console-operator/pull/813

Bug OCPBUGS-8079: one extra $ before {{ $labels.reason }} for description of ClusterOperatorDown alert

View the Description View the linked PRs

Description of problem:

description for ClusterOperatorDown, there is one $ before {{ $labels.reason }}

$ oc -n openshift-cluster-version get prometheusrules cluster-version-operator -oyaml
....
    - alert: ClusterOperatorDown
      annotations:
        description: The {{ $labels.name }} operator may be down or disabled because
          ${{ $labels.reason }}, and the components it manages may be unavailable
          or degraded.  Cluster upgrades may not complete. For more information refer
          to 'oc get -o yaml clusteroperator {{ $labels.name }}'{{ with $console_url
          := "console_url" | query }}{{ if ne (len (label "url" (first $console_url
          ) ) ) 0}} or {{ label "url" (first $console_url ) }}/settings/cluster/{{
          end }}{{ end }}.
        summary: Cluster operator has not been available for 10 minutes.
      expr: |
        max by (namespace, name, reason) (cluster_operator_up{job="cluster-version-operator"} == 0)
      for: 10m
      labels:
        severity: critical

the description is like below if ClusterOperatorDown alert is fired

The insights operator may be down or disabled because $UploadFailed,and the components it manages....

if it's intended, we could close this bug

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-02-27-101545

How reproducible:

always

https://github.com/openshift/cluster-version-operator/pull/992

Task MON-3528: Fix Prometheus downstream manifest file

View the linked PRs

https://github.com/openshift/prometheus/pull/186

Bug OCPBUGS-22741: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/thanos/pull/131

Bug OCPBUGS-25664: no detail log on signature verification failure

View the Description View the linked PRs

This is a clone of issue OCPBUGS-25055. The following is the description of the original issue:
—
Description of problem:

    No detail failure on signature verification while failing to validate signature of the target release payload during upgrade. It's unclear for user to know which action could be taken for the failure. For example, checking if any wrong configmap set, or default store is not available or any issue on custom store?
 
# ./oc adm upgrade
Cluster version is 4.15.0-0.nightly-2023-12-08-202155
Upgradeable=False  

  Reason: FeatureGates_RestrictedFeatureGates_TechPreviewNoUpgrade
  Message: Cluster operator config-operator should not be upgraded between minor versions: FeatureGatesUpgradeable: "TechPreviewNoUpgrade" does not allow updates

ReleaseAccepted=False  
  Reason: RetrievePayload
  Message: Retrieving payload failed version="4.15.0-0.nightly-2023-12-09-012410" image="registry.ci.openshift.org/ocp/release@sha256:0bc9978f420a152a171429086853e80f033e012e694f9a762eee777f5a7fb4f7" failure=The update cannot be verified: unable to verify sha256:0bc9978f420a152a171429086853e80f033e012e694f9a762eee777f5a7fb4f7 against keyrings: verifier-public-key-redhat

Upstream: https://amd64.ocp.releases.ci.openshift.org/graph
Channel: stable-4.15
Recommended updates:  
  VERSION                            IMAGE
  4.15.0-0.nightly-2023-12-09-012410 registry.ci.openshift.org/ocp/release@sha256:0bc9978f420a152a171429086853e80f033e012e694f9a762eee777f5a7fb4f7
 
# ./oc -n openshift-cluster-version logs cluster-version-operator-6b7b5ff598-vxjrq|grep "verified"|tail -n4
I1211 09:28:22.755834       1 sync_worker.go:434] loadUpdatedPayload syncPayload err=The update cannot be verified: unable to verify sha256:0bc9978f420a152a171429086853e80f033e012e694f9a762eee777f5a7fb4f7 against keyrings: verifier-public-key-redhat
I1211 09:28:22.755974       1 event.go:298] Event(v1.ObjectReference{Kind:"ClusterVersion", Namespace:"openshift-cluster-version", Name:"version", UID:"", APIVersion:"config.openshift.io/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'RetrievePayloadFailed' Retrieving payload failed version="4.15.0-0.nightly-2023-12-09-012410" image="registry.ci.openshift.org/ocp/release@sha256:0bc9978f420a152a171429086853e80f033e012e694f9a762eee777f5a7fb4f7" failure=The update cannot be verified: unable to verify sha256:0bc9978f420a152a171429086853e80f033e012e694f9a762eee777f5a7fb4f7 against keyrings: verifier-public-key-redhat
I1211 09:28:37.817102       1 sync_worker.go:434] loadUpdatedPayload syncPayload err=The update cannot be verified: unable to verify sha256:0bc9978f420a152a171429086853e80f033e012e694f9a762eee777f5a7fb4f7 against keyrings: verifier-public-key-redhat
I1211 09:28:37.817488       1 event.go:298] Event(v1.ObjectReference{Kind:"ClusterVersion", Namespace:"openshift-cluster-version", Name:"version", UID:"", APIVersion:"config.openshift.io/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'RetrievePayloadFailed' Retrieving payload failed version="4.15.0-0.nightly-2023-12-09-012410" image="registry.ci.openshift.org/ocp/release@sha256:0bc9978f420a152a171429086853e80f033e012e694f9a762eee777f5a7fb4f7" failure=The update cannot be verified: unable to verify sha256:0bc9978f420a152a171429086853e80f033e012e694f9a762eee777f5a7fb4f7 against keyrings: verifier-public-key-redhat

Version-Release number of selected component (if applicable):

    4.15.0-0.nightly-2023-12-08-202155

How reproducible:

    always

Steps to Reproduce:

    1. trigger an fresh installation with tp enabled(no spec.signaturestores property set by default) 

    2.trigger an upgrade against a nightly build(no signature available in default signature store)

    3.

Actual results:

    no detail log on signature verification failure

Expected results:

    include detail failure on signature verification in the cvo log

Additional info:

    https://github.com/openshift/cluster-version-operator/pull/1003

https://github.com/openshift/cluster-version-operator/pull/1007

Bug OCPBUGS-5823: system:openshift:controller:service-serving-cert-controller referencing non existing serviceAccount

View the Description View the linked PRs

Description of problem:

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.20   True        False         43h     Cluster version is 4.11.20

$ oc get clusterrolebinding system:openshift:controller:service-serving-cert-controller -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  annotations:
    rbac.authorization.kubernetes.io/autoupdate: "true"
  creationTimestamp: "2023-01-11T13:19:24Z"
  name: system:openshift:controller:service-serving-cert-controller
  resourceVersion: "11410"
  uid: 8b3e8c56-9f25-4f89-9159-5300585cc129
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:openshift:controller:service-serving-cert-controller
subjects:
- kind: ServiceAccount
  name: service-serving-cert-controller
  namespace: openshift-infra

$ oc get sa service-serving-cert-controller -n openshift-infra
Error from server (NotFound): serviceaccounts "service-serving-cert-controller" not found

The serviceAccount service-serving-cert-controller does not exist. Neither in openshift-infra nor in any other namespace.

It's therefore not clear what this ClusterRoleBinding does, what use-case it does fulfill and why it references non existing serviceAccount.

From Security point of view, it's recommended to remove non serviceAccounts from ClusterRoleBindings as a potential attacker could abuse the current state by creating the necessary serviceAccount and gain undesired permissions.

Version-Release number of selected component (if applicable):

OpenShift Container Platform 4 (all version from what we have found)

How reproducible:

Always

Steps to Reproduce:

1. Install OpenShift Container Platform 4
2. Run oc get clusterrolebinding system:openshift:controller:service-serving-cert-controller -o yaml

Actual results:

$ oc get clusterrolebinding system:openshift:controller:service-serving-cert-controller -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  annotations:
    rbac.authorization.kubernetes.io/autoupdate: "true"
  creationTimestamp: "2023-01-11T13:19:24Z"
  name: system:openshift:controller:service-serving-cert-controller
  resourceVersion: "11410"
  uid: 8b3e8c56-9f25-4f89-9159-5300585cc129
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:openshift:controller:service-serving-cert-controller
subjects:
- kind: ServiceAccount
  name: service-serving-cert-controller
  namespace: openshift-infra

$ oc get sa service-serving-cert-controller -n openshift-infra
Error from server (NotFound): serviceaccounts "service-serving-cert-controller" not found

Expected results:

The serviceAccount called service-serving-cert-controller to exist or otherwise the ClusterRoleBinding to be removed.

Additional info:

Finding related to a Security review done on the OpenShift Container Platform 4 - Platform

https://github.com/openshift/openshift-apiserver/pull/388

Bug OCPBUGS-19698: Multi-egress source route entries do not get properly updated with adminpolicybasedexternalroutes CR

View the Description View the linked PRs

Description of problem:

Multi-egress source route entries do not get properly updated with adminpolicybasedexternalroutes CR

Version-Release number of selected component (if applicable):

Upstream ovn-kubernetes commit c60963123d28075288a8c23d2796c2df89f54601

How reproducible (100%):

Create a served/application pod after creating the adminpolicybasedexternalroutes CR. The corresponding source route entries wont be added to the worker routing table

Steps to Reproduce:

1. Create a ovn-kubernetes kind cluster:
./kind.sh --install-cni-plugins --disable-snat-multiple-gws --multi-network-enable
2. Create two namespaces:
$ cat <<EOF | kubectl apply -f -
---
apiVersion: v1
kind: Namespace
metadata:
  name: frr
  labels:     
    gws: "true"
spec: {}
---
apiVersion: v1
kind: Namespace
metadata:
  name: bar
  labels:
    multiple_gws: "true"
spec: {}
EOF

3. Create a network attachment definition:
$ cat <<EOF | kubectl apply -f -
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: internal-net
  namespace: frr
spec:
  config: |-
    {
      "cniVersion": "0.3.1",
      "name": "internal-net",
      "plugins": [
        {
          "type": "macvlan",
          "master": "breth0",
          "mode": "bridge",
          "ipam": {
            "type": "static"
          }
        },
        {
          "capabilities": {
            "mac": true,
            "ips": true
          },
          "type": "tuning"
        }
      ]
    }
EOF

4. Create the first dummy pod:
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: dummy1
  namespace: bar
spec:
  containers:
  - name: dummy
    image: centos
    command:
      - sleep
      - infinity
  nodeSelector:
    kubernetes.io/hostname: ovn-worker2
EOF

5. Create the AdminPolicyBasedExternalRoute CR:
$ cat <<EOF | kubectl apply -f -
apiVersion: k8s.ovn.org/v1
kind: AdminPolicyBasedExternalRoute
metadata:
  name: honeypotting
spec:
## gateway example
  from:
    namespaceSelector:
      matchLabels:
          multiple_gws: "true"
  nextHops:       
    dynamic:
      - podSelector:
          matchLabels:
            gw: "true"
        bfdEnabled: true
        namespaceSelector:
          matchLabels:
            gws: "true"
        networkAttachmentName: frr/internal-net
EOF

6. Create the lb pod:
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: ext-gw
  labels:
    gw: "true"
  namespace: frr
  annotations:
    k8s.v1.cni.cncf.io/networks: '[
        {
          "name": "internal-net",
          "ips": [ "172.18.0.10/16" ]
        }
      ]'
spec:
  containers:
  - name: frr
    image: centos
    command:
      - sleep
      - infinity
    securityContext:
      privileged: true
  nodeSelector:
    kubernetes.io/hostname: ovn-worker
EOF

7. Create a second dummy pod:
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: dummy2
  namespace: bar
spec:
  containers:
  - name: dummy
    image: centos
    command:
      - sleep
      - infinity
  nodeSelector:
    kubernetes.io/hostname: ovn-worker2
EOF

Actual results:

Only source route entries for the first dummy pod were created:

$ kubectl get po -o wide -n bar
dummy1   Running  10.244.1.3
dummy2   Running  10.244.1.4

$ POD=$(kubectl get pod -n ovn-kubernetes -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}' | grep ovnkube-db-) ; kubectl exec -ti $POD -n ovn-kubernetes -c nb-ovsdb -- bash

[root@ovn-control-plane ~]# ovn-nbctl lr-route-list GR_ovn-worker2
IPv4 Routes
Route Table <main>:
               10.244.1.3               172.18.0.10 src-ip exgw-rtoe-GR_ovn-worker2 ecmp-symmetric-reply bfd
         169.254.169.0/29             169.254.169.4 dst-ip rtoe-GR_ovn-worker2
            10.244.0.0/16                100.64.0.1 dst-ip
                0.0.0.0/0                172.18.0.1 dst-ip rtoe-GR_ovn-worker

Expected results:

Source route entries for both dummy pods created:
[root@ovn-control-plane ~]# ovn-nbctl lr-route-list GR_ovn-worker2
IPv4 Routes
Route Table <main>:
               10.244.1.3               172.18.0.10 src-ip exgw-rtoe-GR_ovn-worker2 ecmp-symmetric-reply bfd
               10.244.1.4               172.18.0.10 src-ip exgw-rtoe-GR_ovn-worker2 ecmp-symmetric-reply bfd
          169.254.169.0/29             169.254.169.4 dst-ip rtoe-GR_ovn-worker2
            10.244.0.0/16                100.64.0.1 dst-ip
                0.0.0.0/0                172.18.0.1 dst-ip rtoe-GR_ovn-worke

Additional info:

$ kubectl describe adminpolicybasedexternalroutes
...
Status:
  Last Transition Time:  2023-09-25T09:50:25Z
  Messages:
    Configured external gateway IPs: 172.18.0.10
    Status:  Success
Events:    <none>

https://github.com/openshift/ovn-kubernetes/pull/1923

Bug OCPBUGS-23485: eventlet dependency breaks python-dns in RHEL 9.3 rebase

View the Description View the linked PRs

RHEL 9.3 broke at least ironic when it rebased python-dns to 2.3.0

dnspython 2.3.0 raised AttributeError: module 'dns.rdtypes' has no attribute 'ANY' https://github.com/eventlet/eventlet/issues/781

https://github.com/openshift/ironic-image/pull/425

Bug OCPBUGS-11286: Installed Operators page crashes with "Oh no! Something went wrong." error

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

OCP 4.13.0-0.nightly-2023-03-23-204038
ODF 4.13.0-121.stable

How reproducible:

Steps to Reproduce:

1. Installed ODF over OCP, everything was fine on the Installed Operators page.
2. Later when checked Installed Operators page, it crashed with "Oh no! Something went wrong" error.
3.

Actual results:

 Installed Operators page crashes with "Oh no! Something went wrong." error

Expected results:

 Installed Operators page shouldn't crash

Component and Stack trace logs from the console page- http://pastebin.test.redhat.com/1096522

Additional info:

https://github.com/openshift/console/pull/12810

Bug OCPBUGS-20500: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-26412: CPO Failing to delete default worker security group, but not reflected in HostedCluster status condition

View the Description View the linked PRs

This is a clone of issue OCPBUGS-23362. The following is the description of the original issue:
—

A hostedcluster/hostedcontrolplane were stuck uninstalling. Inspecting the CPO logs, it showed that

"error": "failed to delete AWS default security group: failed to delete security group sg-04abe599e5567b025: DependencyViolation: resource sg-04abe599e5567b025 has a dependent object\n\tstatus code: 400, request id: f776a43f-8750-4f04-95ce-457659f59095"

Unfortunately, I do not have enough access to the AWS account to inspect this security group, though I know it is the default worker security group because it's recorded in the hostedcluster .status.platform.aws.defaultWorkerSecurityGroupID

Version-Release number of selected component (if applicable):

4.14.1

How reproducible:

I haven't tried to reproduce it yet, but can do so and update this ticket when I do. My theory is:

Steps to Reproduce:

1. Create an AWS HostedCluster, wait for it to create/populate defaultWorkerSecurityGroupID
2. Attach the defaultWorkerSecurityGroupID to anything else in the AWS account unrelated to the HCP cluster
3. Attempt to delete the HostedCluster

Actual results:

CPO logs:
"error": "failed to delete AWS default security group: failed to delete security group sg-04abe599e5567b025: DependencyViolation: resource sg-04abe599e5567b025 has a dependent object\n\tstatus code: 400, request id: f776a43f-8750-4f04-95ce-457659f59095"

HostedCluster Status Condition
  - lastTransitionTime: "2023-11-09T22:18:09Z"
    message: ""
    observedGeneration: 3
    reason: StatusUnknown
    status: Unknown
    type: CloudResourcesDestroyed

Expected results:

I would expect that the CloudResourcesDestroyed status condition on the hostedcluster would reflect this security group as holding up the deletion instead of having to parse through logs.

Additional info:

https://github.com/openshift/hypershift/pull/3381

Story HOSTEDCP-1285: Simplify kas ports exposure

View the Description View the linked PRs

As an API consumer I don't want to be able to modify ports that are implementation details and have no consumer impact on features.

As a hypershift dev I want to support all the KAS exposure features while keeping the code sustainable and extensible

Problem
Hypershift requires the kas corev1.endpoint port to be exposed in the data plane hosts. This is so when resolving traffic via SVC we capture traffic in that endpoint port and we leet haproxy redirect it to the LB that resolves to KAS.
A while ago we introduced spec.metworking.apiServer.port to enable IBM to choose which port would be exposed in the data plane hosts, as using hardcode one might conflict with their env requirements.
However as we evolved the different support matrix for our endpoints publishing strategy, we mistakenly used that input as the source for other ports exposure as the internal HCP namespace SVC. We also forced overwriting the corev1.endpoint value to avoid a discrepancy with what the kas pod was generating.

Solutions
Untangle the above by:

Never overriding the corev1.endpoint
Using spec.metworking.apiServer.port only for what's meant, i.e the KAS pod port.
Hiding the KAS SVC port. This is an impl detail.
If we ever require the dedicated KAS LB port to be a choice, that would be input in the LB publishing strategy

https://github.com/openshift/hypershift/pull/2964
https://github.com/openshift/hypershift/pull/3149
https://github.com/openshift/hypershift/pull/3147

https://github.com/openshift/hypershift/pull/3185
https://github.com/openshift/hypershift/pull/3186

Bug OCPBUGS-23765: Helm README spacing issue in dark mode

View the Description View the linked PRs

Issue 30 from https://docs.google.com/spreadsheets/d/1TR3ENY-GE_LQL9F-xH6NahtRHdu6IrYGhLqDDA8y0EI/edit#gid=1035185624

In Helm page, on click of README link, margin spacing is missing in all direction

Screenshot: https://drive.google.com/file/d/1pYFsVxJrB4m2s7pYuw1QeTW3j38A_fRT/view?usp=drive_link

https://github.com/openshift/console/pull/13370

Bug OCPBUGS-23971: After PatternFly5 update: table headers are missing at mobile resolutions

View the Description View the linked PRs

The existing tables that have hard-coded PF5 classnames don't display table headers at mobile resolutions. This is because of the inclusion of `pf-m-grid-md` alongside `pf-v5-c-table`. We should remove `pf-m-grid-md` to preserve the functionality it was prior to the PF5 upgrade.

https://github.com/openshift/console/pull/13373

Bug OCPBUGS-5728: "agent wait-for" command is not logging into .openshift_install.log file

View the Description View the linked PRs

Description of problem:

CU has deployed OCP 4.12.rc dev-preview release using Agent based installer. While installation it was observed that [<install-dir>/.openshift_install.log] file only contains the logs of openshift-install agent create image command and other logs are missing [openshift-install agent wait-for].

However with the previous release on IPI/UPI the logs of [wait-for] was available in [.openshift_install.log] file.

CU wants to understand is there is a change in functionality of openshift-install, with agent command and can it be made available ?

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1.Install OCP 4.12 dev-release using Agent Based installer.
2.Only logs visible of [agent iso create] not for "wait-for" command.
3.

Actual results:

agent wait-for Logs are missing

Expected results:

openshift-install agent wait-for install-complete should logs into .openshift_install.log file

Additional info:

https://github.com/openshift/installer/pull/7452

Task RHOBS-956: Handle change in subscription labels metric

View the Description View the linked PRs

What

Via https://gitlab.cee.redhat.com/service/uhc-account-manager/-/merge_requests/4233 OCM has renamed subscription_labels to ocm_subscription and some/many recording rules are likely to be effected for example https://github.com/openshift/telemeter/blob/8f091e8e7ecd3052566bd9dd20eb6991abf762c5/jsonnet/telemeter/rules.libsonnet#L34

How

Update the rules.

https://github.com/openshift/telemeter/pull/495

Bug OCPBUGS-19037: agent-tui failure blocks ssh + console login

View the Description View the linked PRs

The agent-interactive-console service is required by both sshd and systemd-logind, so if it exits with an error code there is no way to connect or log in to the box to debug.

https://github.com/openshift/installer/pull/7490

Bug OCPBUGS-21863: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/252

Bug OCPBUGS-21969: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/builder/pull/361

Bug OCPBUGS-22385: vmware-vsphere-csi-driver-webhook handles HTTP/2 requests

View the Description View the linked PRs

Description of problem:

Currently, vmware-vsphere-csi-driver-webhook exposes HTTP/2 endpoints:

$ oc -n openshift-cluster-csi-drivers exec deployment/vmware-vsphere-csi-driver-webhook -- curl -kv   https://localhost:8443/readyz

...
* ALPN, server accepted to use h2
> GET /readyz HTTP/2
< HTTP/2 404

To err on the side of caution, we should discontinue the handling of HTTP/2 requests.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always

Steps to Reproduce:

1. oc -n openshift-cluster-csi-drivers exec deployment/vmware-vsphere-csi-driver-webhook -- curl -kv https://localhost:8443/readyz 2.
3.

Actual results:

HTTP/2 requests are accepted

Expected results:

HTTP/2 requests shouldn't be accepted by wehook

Additional info:

https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/179

Bug OCPBUGS-17682: topologySpreadConstraints for UWM prometheus-operator does not work

View the Description View the linked PRs

Description of problem:

since in-cluster prometheus-operator and UWM prometheus-operator pods are scheduled to master nodes, see from

https://github.com/openshift/cluster-monitoring-operator/blob/release-4.14/assets/prometheus-operator/deployment.yaml#L88-L97

https://github.com/openshift/cluster-monitoring-operator/blob/release-4.14/assets/prometheus-operator-user-workload/deployment.yaml#L91-L103

enabled UWM and add topologySpreadConstraints for in-cluster prometheus-operator and UWM prometheus-operator(set topologyKey to node-role.kubernetes.io/master), topologySpreadConstraints takes effect for in-cluster prometheus-operator, but not for UWM prometheus-operator

apiVersion: v1
data:
  config.yaml: |
    enableUserWorkload: true
    prometheusOperator:
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: node-role.kubernetes.io/master
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app.kubernetes.io/name: prometheus-operator
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring

in-cluster prometheus-operator, topologySpreadConstraints settings are loaded to prometheus-operator pod and deployment, see

$ oc -n openshift-monitoring get deploy prometheus-operator -oyaml | grep topologySpreadConstraints -A7
      topologySpreadConstraints:
      - labelSelector:
          matchLabels:
            app.kubernetes.io/name: prometheus-operator
        maxSkew: 1
        topologyKey: node-role.kubernetes.io/master
        whenUnsatisfiable: DoNotSchedule
      volumes:

$ oc -n openshift-monitoring get pod -l app.kubernetes.io/name=prometheus-operator -o wide
NAME                                   READY   STATUS    RESTARTS   AGE    IP            NODE                                                 NOMINATED NODE   READINESS GATES
prometheus-operator-65496d5b78-fb9nq   2/2     Running   0          105s   10.128.0.71   juzhao-0813-szb9h-master-0.c.openshift-qe.internal   <none>           <none>

$ oc -n openshift-monitoring get pod prometheus-operator-65496d5b78-fb9nq -oyaml | grep topologySpreadConstraints -A7
    topologySpreadConstraints:
    - labelSelector:
        matchLabels:
          app.kubernetes.io/name: prometheus-operator
      maxSkew: 1
      topologyKey: node-role.kubernetes.io/master
      whenUnsatisfiable: DoNotSchedule
    volumes:

but the topologySpreadConstraints settings are not loaded to UWM prometheus-operator pod and deployment

$ oc -n openshift-user-workload-monitoring get cm user-workload-monitoring-config -oyaml
apiVersion: v1
data:
  config.yaml: |
    prometheusOperator:
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: node-role.kubernetes.io/master
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app.kubernetes.io/name: prometheus-operator
kind: ConfigMap
metadata:
  creationTimestamp: "2023-08-14T08:10:49Z"
  labels:
    app.kubernetes.io/managed-by: cluster-monitoring-operator
    app.kubernetes.io/part-of: openshift-monitoring
  name: user-workload-monitoring-config
  namespace: openshift-user-workload-monitoring
  resourceVersion: "212490"
  uid: 048f91cb-4da6-4b1b-9e1f-c769096ab88c

$ oc -n openshift-user-workload-monitoring get deploy prometheus-operator -oyaml | grep topologySpreadConstraints -A7
no result

$ oc -n openshift-user-workload-monitoring get pod -l app.kubernetes.io/name=prometheus-operator
NAME                                   READY   STATUS    RESTARTS   AGE
prometheus-operator-77bcdcbd9c-m5x8z   2/2     Running   0          15m

$ oc -n openshift-user-workload-monitoring get pod prometheus-operator-77bcdcbd9c-m5x8z -oyaml | grep topologySpreadConstraints
no result

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-11-055332

How reproducible:

always

Steps to Reproduce:

1. see the description
2.
3.

Actual results:

topologySpreadConstraints settings are not loaded to UWM prometheus-operator pod and deployment

Expected results:

topologySpreadConstraints settings loaded to UWM prometheus-operator pod and deployment

https://github.com/openshift/cluster-monitoring-operator/pull/2072

Bug OCPBUGS-24102: Update 4.15 ose-cluster-autoscaler-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-autoscaler-operator/pull/302

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-autoscaler-operator/pull/302

Bug OCPBUGS-27217: [4.15] Add suite to openshift origin

View the Description View the linked PRs

Description of problem:

Backport of live migration suite in origin to 4.15

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/28524

Bug OCPBUGS-26239: pathological events test failed multiple times for ns/openshift-kube-scheduler

View the Description View the linked PRs

This is a clone of issue OCPBUGS-24537. The following is the description of the original issue:
—
Description of problem:

    4.15 nightly payloads have been affected by this test multiple times:

: [sig-arch] events should not repeat pathologically for ns/openshift-kube-scheduler expand_less0s{ 1 events happened too frequently

event happened 21 times, something is wrong: namespace/openshift-kube-scheduler node/ci-op-2gywzc86-aa265-5skmk-master-1 pod/openshift-kube-scheduler-guard-ci-op-2gywzc86-aa265-5skmk-master-1 hmsg/2652c73da5 - reason/ProbeError Readiness probe error: Get "https://10.0.0.7:10259/healthz": dial tcp 10.0.0.7:10259: connect: connection refused result=reject
body:
 From: 08:41:08Z To: 08:41:09Z}

In each of the 10 jobs aggregated, 2 to 3 jobs failed with this test. Historically this test passed 100%. But with the past two days test data, the passing rate has dropped to 97% and aggregator started allowing this in the latest payload: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/aggregated-azure-ovn-upgrade-4.15-micro-release-openshift-release-analysis-aggregator/1732295947339173888

The first payload this started appearing is https://amd64.ocp.releases.ci.openshift.org/releasestream/4.15.0-0.nightly/release/4.15.0-0.nightly-2023-12-05-071627.

All the events happened during cluster-operator/kube-scheduler progressing.

For comparison, here is a passed job: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade/1731936539870498816

Here is a failed one: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade/1731936538192777216

They both have the same set of probe error events. For the passing jobs, the frequency is lower than 20, while for the failed job, one of those events repeated more than 20 times and therefore results in the test failure.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/28504

Bug OCPBUGS-20362: Delete results results.tekton.dev annotations on rerun of PipelineRuns

View the Description View the linked PRs

Description of problem:

Creating a pipelinerun with previous annotations leads to the result not being created. But records are updated with new taskruns.

https://github.com/tektoncd/results/issues/556

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Install TektonResults on the cluster
2. Create a Pipeline and start the Pipeline
3. Rerun the PipelineRun
3. Check the records endpoint. eg: https://tekton-results-api-service-openshift-pipelines.apps.viraj-11-10-2023.devcluster.openshift.com/apis/results.tekton.dev/v1alpha2/parents/viraj/results/-/records
the new PipelineRun is not get saved.

Actual results:

New PipelineRun get created after the rerun is not get saved in the records

Expected results:

All PipelineRun should be saved in the records

Additional info:

Document to install TektonResults on the cluster https://gist.github.com/vikram-raj/257d672a38eb2159b0368eaed8f8970a

https://github.com/openshift/console/pull/13230

Bug OCPBUGS-23466: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3218

Story OCPCLOUD-2169: Test in Origin suites that CPMS does not rollout

View the Description View the linked PRs

Background

The origin test suite does not test CPMS, so, it should never have a CPMS rollout occur during a run.

We should add a test that checks that, early in the suite, the control plane machines are all named <cluster-name>~~master~~<index>. If for any reason we see a control plane machine matching <cluster-name>~~master~~<random>-<index> we know that the CPMS has rolled out and the test should be aborted until we work out why the CPMS rolled out.

The hope here is that it becomes very obvious when there are issues with CPMS, even when these issues are introduced by other repositories.

Steps

Add test to origin as per above description
Discuss with TRT to make sure the test is running in regular and serial jobs and is early enough to be detect the issue if other failures may kill the job early
Tests will be added into https://github.com/openshift/origin/tree/master/test/extended/machines

Stakeholders

Cluster infra
Installer
TRT

Definition of Done

Origin tests fail with an obvious CPMS related issue when the CPMS attempts a rollout

Docs

<Add docs requirements for this card>

Testing

<Explain testing that will be added>

Task HOSTEDCP-1312: Fix update-codegen.sh to work locally

View the linked PRs

https://github.com/openshift/hypershift/pull/3214

Bug OCPBUGS-17391: Rollout of ovnk pods is taking more time

View the Description View the linked PRs

the pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-local-to-shared-gateway-mode-migration job started failing recently when the
ovnkube-master daemonset would not finish rolling out after 360s.

taking the must gather to debug which happens a few minutes after the test
failure you can see that the daemonset is still not ready, so I believe that
increasing the timeout is not the answer.

some debug info:

➜ static-kas git:(master) oc --kubeconfig=/tmp/kk get daemonsets -A 
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
openshift-cluster-csi-drivers aws-ebs-csi-driver-node 6 6 6 6 6 kubernetes.io/os=linux 8h
openshift-cluster-node-tuning-operator tuned 6 6 6 6 6 kubernetes.io/os=linux 8h
openshift-dns dns-default 6 6 6 6 6 kubernetes.io/os=linux 8h
openshift-dns node-resolver 6 6 6 6 6 kubernetes.io/os=linux 8h
openshift-image-registry node-ca 6 6 6 6 6 kubernetes.io/os=linux 8h
openshift-ingress-canary ingress-canary 3 3 3 3 3 kubernetes.io/os=linux 8h
openshift-machine-api machine-api-termination-handler 0 0 0 0 0 kubernetes.io/os=linux,machine.openshift.io/interruptible-instance= 8h
openshift-machine-config-operator machine-config-daemon 6 6 6 6 6 kubernetes.io/os=linux 8h
openshift-machine-config-operator machine-config-server 3 3 3 3 3 node-role.kubernetes.io/master= 8h
openshift-monitoring node-exporter 6 6 6 6 6 kubernetes.io/os=linux 8h
openshift-multus multus 6 6 6 6 6 kubernetes.io/os=linux 9h
openshift-multus multus-additional-cni-plugins 6 6 6 6 6 kubernetes.io/os=linux 9h
openshift-multus network-metrics-daemon 6 6 6 6 6 kubernetes.io/os=linux 9h
openshift-network-diagnostics network-check-target 6 6 6 6 6 beta.kubernetes.io/os=linux 9h
openshift-ovn-kubernetes ovnkube-master 3 3 2 2 2 beta.kubernetes.io/os=linux,node-role.kubernetes.io/master= 9h
openshift-ovn-kubernetes ovnkube-node 6 6 6 6 6 beta.kubernetes.io/os=linux 9h
Name: ovnkube-master
Selector: app=ovnkube-master
Node-Selector: beta.kubernetes.io/os=linux,node-role.kubernetes.io/master=
Labels: networkoperator.openshift.io/generates-operator-status=stand-alone
Annotations: deprecated.daemonset.template.generation: 3
kubernetes.io/description: This daemonset launches the ovn-kubernetes controller (master) networking components.
networkoperator.openshift.io/cluster-network-cidr: 10.128.0.0/14
networkoperator.openshift.io/hybrid-overlay-status: disabled
networkoperator.openshift.io/ip-family-mode: single-stack
release.openshift.io/version: 4.14.0-0.ci.test-2023-08-04-123014-ci-op-c6fp05f4-latest
Desired Number of Nodes Scheduled: 3
Current Number of Nodes Scheduled: 3
Number of Nodes Scheduled with Up-to-date Pods: 2
Number of Nodes Scheduled with Available Pods: 2
Number of Nodes Misscheduled: 0
Pods Status: 3 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: app=ovnkube-master
component=network
kubernetes.io/os=linux
openshift.io/component=network
ovn-db-pod=true
type=infra
Annotations: networkoperator.openshift.io/cluster-network-cidr: 10.128.0.0/14
networkoperator.openshift.io/hybrid-overlay-status: disabled
networkoperator.openshift.io/ip-family-mode: single-stack
target.workload.openshift.io/management:
{"effect": "PreferredDuringScheduling"}
Service Account: ovn-kubernetes-controller

it seems there is one pod that is not coming up all the way and that pod has
two containers not ready (sbdb and nbdb). logs from those containers below:

➜ static-kas git:(master) oc --kubeconfig=/tmp/kk describe pod ovnkube-master-7qlm5 -n openshift-ovn-kubernetes | rg '^ [a-z].*:|Ready'
northd:
Ready: True
nbdb:
Ready: False
kube-rbac-proxy:
Ready: True
sbdb:
Ready: False
ovnkube-master:
Ready: True
ovn-dbchecker:
Ready: True
➜ static-kas git:(master) oc --kubeconfig=/tmp/kk logs ovnkube-master-7qlm5 -n openshift-ovn-kubernetes -c sbdb
2023-08-04T13:08:49.127480354Z + [[ -f /env/_master ]]
2023-08-04T13:08:49.127562165Z + trap quit TERM INT
2023-08-04T13:08:49.127609496Z + ovn_kubernetes_namespace=openshift-ovn-kubernetes
2023-08-04T13:08:49.127637926Z + ovndb_ctl_ssl_opts='-p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt'
2023-08-04T13:08:49.127637926Z + transport=ssl
2023-08-04T13:08:49.127645167Z + ovn_raft_conn_ip_url_suffix=
2023-08-04T13:08:49.127682687Z + [[ 10.0.42.108 == \: ]]
2023-08-04T13:08:49.127690638Z + db=sb
2023-08-04T13:08:49.127690638Z + db_port=9642
2023-08-04T13:08:49.127712038Z + ovn_db_file=/etc/ovn/ovnsb_db.db
2023-08-04T13:08:49.127854181Z + [[ ! ssl:10.0.102.2:9642,ssl:10.0.42.108:9642,ssl:10.0.74.128:9642 =~ .:10\.0\.42\.108:. ]]
2023-08-04T13:08:49.128199437Z ++ bracketify 10.0.42.108
2023-08-04T13:08:49.128237768Z ++ case "$1" in
2023-08-04T13:08:49.128265838Z ++ echo 10.0.42.108
2023-08-04T13:08:49.128493242Z + OVN_ARGS='--db-sb-cluster-local-port=9644 --db-sb-cluster-local-addr=10.0.42.108 --no-monitor --db-sb-cluster-local-proto=ssl --ovn-sb-db-ssl-key=/ovn-cert/tls.key --ovn-sb-db-ssl-cert=/ovn-cert/tls.crt --ovn-sb-db-ssl-ca-cert=/ovn-ca/ca-bundle.crt'
2023-08-04T13:08:49.128535253Z + CLUSTER_INITIATOR_IP=10.0.102.2
2023-08-04T13:08:49.128819438Z ++ date -Iseconds
2023-08-04T13:08:49.130157063Z 2023-08-04T13:08:49+00:00 - starting sbdb CLUSTER_INITIATOR_IP=10.0.102.2
2023-08-04T13:08:49.130170893Z + echo '2023-08-04T13:08:49+00:00 - starting sbdb CLUSTER_INITIATOR_IP=10.0.102.2'
2023-08-04T13:08:49.130170893Z + initialize=false
2023-08-04T13:08:49.130179713Z + [[ ! -e /etc/ovn/ovnsb_db.db ]]
2023-08-04T13:08:49.130318475Z + [[ false == \t\r\u\e ]]
2023-08-04T13:08:49.130406657Z + wait 9
2023-08-04T13:08:49.130493659Z + exec /usr/share/ovn/scripts/ovn-ctl -db-sb-cluster-local-port=9644 --db-sb-cluster-local-addr=10.0.42.108 --no-monitor --db-sb-cluster-local-proto=ssl --ovn-sb-db-ssl-key=/ovn-cert/tls.key --ovn-sb-db-ssl-cert=/ovn-cert/tls.crt --ovn-sb-db-ssl-ca-cert=/ovn-ca/ca-bundle.crt '-ovn-sb-log=-vconsole:info -vfile:off -vPATTERN:console:%D
{%Y-%m-%dT%H:%M:%S.###Z}
|%05N|%c%T|%p|%m' run_sb_ovsdb
2023-08-04T13:08:49.208399304Z 2023-08-04T13:08:49.208Z|00001|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-sb.log
2023-08-04T13:08:49.213507987Z ovn-sbctl: unix:/var/run/ovn/ovnsb_db.sock: database connection failed (No such file or directory)
2023-08-04T13:08:49.224890005Z 2023-08-04T13:08:49Z|00001|reconnect|INFO|unix:/var/run/ovn/ovnsb_db.sock: connecting...
2023-08-04T13:08:49.224912156Z 2023-08-04T13:08:49Z|00002|reconnect|INFO|unix:/var/run/ovn/ovnsb_db.sock: connection attempt failed (No such file or directory)
2023-08-04T13:08:49.255474964Z 2023-08-04T13:08:49.255Z|00002|raft|INFO|local server ID is 7f92
2023-08-04T13:08:49.333342909Z 2023-08-04T13:08:49.333Z|00003|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 3.1.2
2023-08-04T13:08:49.348948944Z 2023-08-04T13:08:49.348Z|00004|reconnect|INFO|ssl:10.0.102.2:9644: connecting...
2023-08-04T13:08:49.349002565Z 2023-08-04T13:08:49.348Z|00005|reconnect|INFO|ssl:10.0.74.128:9644: connecting...
2023-08-04T13:08:49.352510569Z 2023-08-04T13:08:49.352Z|00006|reconnect|INFO|ssl:10.0.102.2:9644: connected
2023-08-04T13:08:49.353870484Z 2023-08-04T13:08:49.353Z|00007|reconnect|INFO|ssl:10.0.74.128:9644: connected
2023-08-04T13:08:49.889326777Z 2023-08-04T13:08:49.889Z|00008|raft|INFO|server 2501 is leader for term 5
2023-08-04T13:08:49.890316765Z 2023-08-04T13:08:49.890Z|00009|raft|INFO|rejecting append_request because previous entry 5,1538 not in local log (mismatch past end of log)
2023-08-04T13:08:49.891199951Z 2023-08-04T13:08:49.891Z|00010|raft|INFO|rejecting append_request because previous entry 5,1539 not in local log (mismatch past end of log)
2023-08-04T13:08:50.225632838Z 2023-08-04T13:08:50Z|00003|reconnect|INFO|unix:/var/run/ovn/ovnsb_db.sock: connecting...
2023-08-04T13:08:50.225677739Z 2023-08-04T13:08:50Z|00004|reconnect|INFO|unix:/var/run/ovn/ovnsb_db.sock: connected
2023-08-04T13:08:50.227772827Z Waiting for OVN_Southbound to come up.
2023-08-04T13:08:55.716284614Z 2023-08-04T13:08:55.716Z|00011|raft|INFO|ssl:10.0.74.128:43498: learned server ID 3dff
2023-08-04T13:08:55.716323395Z 2023-08-04T13:08:55.716Z|00012|raft|INFO|ssl:10.0.74.128:43498: learned remote address ssl:10.0.74.128:9644
2023-08-04T13:08:55.724570375Z 2023-08-04T13:08:55.724Z|00013|raft|INFO|ssl:10.0.102.2:47804: learned server ID 2501
2023-08-04T13:08:55.724599466Z 2023-08-04T13:08:55.724Z|00014|raft|INFO|ssl:10.0.102.2:47804: learned remote address ssl:10.0.102.2:9644
2023-08-04T13:08:59.348572779Z 2023-08-04T13:08:59.348Z|00015|memory|INFO|32296 kB peak resident set size after 10.1 seconds
2023-08-04T13:08:59.348648190Z 2023-08-04T13:08:59.348Z|00016|memory|INFO|atoms:35959 cells:31476 monitors:0 n-weak-refs:749 raft-connections:4 raft-log:1543 txn-history:100 txn-history-atoms:7100
➜ static-kas git:(master) oc --kubeconfig=/tmp/kk logs ovnkube-master-7qlm5 -n openshift-ovn-kubernetes -c nbdb 
2023-08-04T13:08:48.779743434Z + [[ -f /env/_master ]]
2023-08-04T13:08:48.779743434Z + trap quit TERM INT
2023-08-04T13:08:48.779825516Z + ovn_kubernetes_namespace=openshift-ovn-kubernetes
2023-08-04T13:08:48.779825516Z + ovndb_ctl_ssl_opts='-p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt'
2023-08-04T13:08:48.779825516Z + transport=ssl
2023-08-04T13:08:48.779825516Z + ovn_raft_conn_ip_url_suffix=
2023-08-04T13:08:48.779825516Z + [[ 10.0.42.108 == \: ]]
2023-08-04T13:08:48.779825516Z + db=nb
2023-08-04T13:08:48.779825516Z + db_port=9641
2023-08-04T13:08:48.779825516Z + ovn_db_file=/etc/ovn/ovnnb_db.db
2023-08-04T13:08:48.779887606Z + [[ ! ssl:10.0.102.2:9641,ssl:10.0.42.108:9641,ssl:10.0.74.128:9641 =~ .:10\.0\.42\.108:. ]]
2023-08-04T13:08:48.780159182Z ++ bracketify 10.0.42.108
2023-08-04T13:08:48.780167142Z ++ case "$1" in
2023-08-04T13:08:48.780172102Z ++ echo 10.0.42.108
2023-08-04T13:08:48.780314224Z + OVN_ARGS='--db-nb-cluster-local-port=9643 --db-nb-cluster-local-addr=10.0.42.108 --no-monitor --db-nb-cluster-local-proto=ssl --ovn-nb-db-ssl-key=/ovn-cert/tls.key --ovn-nb-db-ssl-cert=/ovn-cert/tls.crt --ovn-nb-db-ssl-ca-cert=/ovn-ca/ca-bundle.crt'
2023-08-04T13:08:48.780314224Z + CLUSTER_INITIATOR_IP=10.0.102.2
2023-08-04T13:08:48.780518588Z ++ date -Iseconds
2023-08-04T13:08:48.781738820Z 2023-08-04T13:08:48+00:00 - starting nbdb CLUSTER_INITIATOR_IP=10.0.102.2, K8S_NODE_IP=10.0.42.108
2023-08-04T13:08:48.781753021Z + echo '2023-08-04T13:08:48+00:00 - starting nbdb CLUSTER_INITIATOR_IP=10.0.102.2, K8S_NODE_IP=10.0.42.108'
2023-08-04T13:08:48.781753021Z + initialize=false
2023-08-04T13:08:48.781753021Z + [[ ! -e /etc/ovn/ovnnb_db.db ]]
2023-08-04T13:08:48.781816342Z + [[ false == \t\r\u\e ]]
2023-08-04T13:08:48.781936684Z + wait 9
2023-08-04T13:08:48.781974715Z + exec /usr/share/ovn/scripts/ovn-ctl -db-nb-cluster-local-port=9643 --db-nb-cluster-local-addr=10.0.42.108 --no-monitor --db-nb-cluster-local-proto=ssl --ovn-nb-db-ssl-key=/ovn-cert/tls.key --ovn-nb-db-ssl-cert=/ovn-cert/tls.crt --ovn-nb-db-ssl-ca-cert=/ovn-ca/ca-bundle.crt '-ovn-nb-log=-vconsole:info -vfile:off -vPATTERN:console:%D
{%Y-%m-%dT%H:%M:%S.###Z}
|%05N|%c%T|%p|%m' run_nb_ovsdb
2023-08-04T13:08:48.851644059Z 2023-08-04T13:08:48.851Z|00001|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-nb.log
2023-08-04T13:08:48.852091247Z ovn-nbctl: unix:/var/run/ovn/ovnnb_db.sock: database connection failed (No such file or directory)
2023-08-04T13:08:48.861365357Z 2023-08-04T13:08:48Z|00001|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connecting...
2023-08-04T13:08:48.861365357Z 2023-08-04T13:08:48Z|00002|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connection attempt failed (No such file or directory)
2023-08-04T13:08:48.875126148Z 2023-08-04T13:08:48.875Z|00002|raft|INFO|local server ID is c503
2023-08-04T13:08:48.911846610Z 2023-08-04T13:08:48.911Z|00003|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 3.1.2
2023-08-04T13:08:48.918864408Z 2023-08-04T13:08:48.918Z|00004|reconnect|INFO|ssl:10.0.102.2:9643: connecting...
2023-08-04T13:08:48.918934490Z 2023-08-04T13:08:48.918Z|00005|reconnect|INFO|ssl:10.0.74.128:9643: connecting...
2023-08-04T13:08:48.923439162Z 2023-08-04T13:08:48.923Z|00006|reconnect|INFO|ssl:10.0.102.2:9643: connected
2023-08-04T13:08:48.925166154Z 2023-08-04T13:08:48.925Z|00007|reconnect|INFO|ssl:10.0.74.128:9643: connected
2023-08-04T13:08:49.861650961Z 2023-08-04T13:08:49Z|00003|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connecting...
2023-08-04T13:08:49.861747153Z 2023-08-04T13:08:49Z|00004|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connected
2023-08-04T13:08:49.875272530Z 2023-08-04T13:08:49.875Z|00008|raft|INFO|server fccb is leader for term 6
2023-08-04T13:08:49.875302480Z 2023-08-04T13:08:49.875Z|00009|raft|INFO|rejecting append_request because previous entry 6,1732 not in local log (mismatch past end of log)
2023-08-04T13:08:49.876027164Z Waiting for OVN_Northbound to come up.
2023-08-04T13:08:55.694760761Z 2023-08-04T13:08:55.694Z|00010|raft|INFO|ssl:10.0.74.128:57122: learned server ID d382
2023-08-04T13:08:55.694800872Z 2023-08-04T13:08:55.694Z|00011|raft|INFO|ssl:10.0.74.128:57122: learned remote address ssl:10.0.74.128:9643
2023-08-04T13:08:55.706904913Z 2023-08-04T13:08:55.706Z|00012|raft|INFO|ssl:10.0.102.2:43230: learned server ID fccb
2023-08-04T13:08:55.706931733Z 2023-08-04T13:08:55.706Z|00013|raft|INFO|ssl:10.0.102.2:43230: learned remote address ssl:10.0.102.2:9643
2023-08-04T13:08:58.919567770Z 2023-08-04T13:08:58.919Z|00014|memory|INFO|21944 kB peak resident set size after 10.1 seconds
2023-08-04T13:08:58.919643762Z 2023-08-04T13:08:58.919Z|00015|memory|INFO|atoms:8471 cells:7481 monitors:0 n-weak-refs:200 raft-connections:4 raft-log:1737 txn-history:72 txn-history-atoms:8165
➜ static-kas git:(master)

This seems to happen very frequently now, but was not happening before around July 21st.

https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-local-to-shared-gateway-mode-migration?buildId=1684628739427667968

https://github.com/openshift/cluster-network-operator/pull/1978

Bug OCPBUGS-24096: Update 4.15 ose-csi-driver-shared-resource-webhook-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-driver-shared-resource/pull/156

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-driver-shared-resource/pull/156

Bug OCPBUGS-21650: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-operator/pull/25

Bug OCPBUGS-24311: Update 4.15 ose-cluster-api-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api/pull/188

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api/pull/188

Bug OCPBUGS-19162: Update 4.15 ose-cluster-csi-snapshot-controller-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/160

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/160

Bug OCPBUGS-16079: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3067

Bug OCPBUGS-16707: Misleading error message to highlight "HostAlreadyClaimed"

View the Description View the linked PRs

Description of problem:

When we encounter the HostAlreadyClaimed issue, the error message is pointing to the wrong route name.

Version-Release number of selected component (if applicable):

OCP v4.12.z

How reproducible:

Frequently

Steps to Reproduce:

- Created three routes with the similar hosts, one without the path and other eith the paths defined.

# oc get routes
NAME     HOST/PORT                                                                       PATH    SERVICES        PORT   TERMINATION   WILDCARD
route1   httpd-example-path-based-routes.apps.firstcluster.lab.upshift.rdu2.redhat.com           httpd-example   web    edge          None
route2   httpd-example-path-based-routes.apps.firstcluster.lab.upshift.rdu2.redhat.com   /path   httpd-example   web    edge          None
route3   HostAlreadyClaimed                                                              /path   httpd-example   web    edge          None   <---------------


- Got 'HostAlreadyClaimed' error for the third route 'route3' which is expected because the path and the hostname of 'route2' & route3' are the same.

- In the route description, we could see that the first route that is 'route1' is reported to be the older route for the host but we expect it should report 'route2' because the hostname and paths are similar for the route2 and route3. 

# oc describe route route3
Name:            route3
Namespace:        path-based-routes
Created:        14 seconds ago
Labels:            app=httpd-example
            template=httpd-example
Annotations:        <none>
Requested Host:        httpd-example-path-based-routes.apps.firstcluster.lab.upshift.rdu2.redhat.com
            rejected by router default:  (host router-default.apps.firstcluster.lab.upshift.rdu2.redhat.com)HostAlreadyClaimed (14 seconds ago)
              route route1 already exposes httpd-example-path-based-routes.apps.firstcluster.lab.upshift.rdu2.redhat.com and is older   <----------------
Path:            /path
TLS Termination:    edge
Insecure Policy:    <none>
Endpoint Port:        web

Service:    httpd-example
Weight:        100 (100%)
Endpoints:    10.1.2.3:8080 

- However, deleting the 'route2' resolves the issue.

Actual results:

Error messages for 'HostAlreadyClainmed' issue should consider the route name to be reported on the basis of Hostname and paths.

Expected results:

Only hostname is taken into consideration where route's path should be checked as well and then the appropiate route name should be reported in the error.

https://github.com/openshift/router/pull/508

Bug OCPBUGS-17811: Ensure Bootstrap has access to Image Registry Certs

View the Description View the linked PRs

Description of problem:

in 4.14, the MCO became the default provider of image registry certificates. However, all of these certs are put onto disk and into config in cluster. We need a way for components like hypershift, to be able to provide certificates they need to run properly during their bootstrap process.

Version-Release number of selected component (if applicable):

How reproducible:

always with hypershift

Steps to Reproduce:

1. bootstrap a hypershift cluster
2. will fail due to image pull errors

Actual results:

failure due to lack of IR certs

Expected results:

IR certs provided by the component who needs them via a cmd flag, bootstrap success.

Additional info:

https://github.com/openshift/machine-config-operator/pull/3876

Bug OCPBUGS-24862: Update 4.16 ose-installer-altinfra-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/installer/pull/7819

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/installer/pull/7872

Task MIXEDARCH-353: run yq from upi-installer in ipi-install-heterogeneous

View the Description View the linked PRs

In https://github.com/openshift/release/blob/master/ci-operator/step-registry/ipi/install/heterogeneous/ipi-install-heterogeneous-commands.sh#L37-L42, it is downlading yq-v4 from github and use it in the following step.

This will be a potential issue when multiple concurrent jobs are running on the same time, github would deny the access.

We hit ever such issues before, so we installed yq-3.3.0 in upi-installer image, refer to https://github.com/openshift/installer/blob/master/images/installer/Dockerfile.upi.ci.rhel8#L46-L50. Is it possible to migrate the codes to use yq-3.3.0 from upi-installer image?

Before we migrate a lot of ci jobs from arm and amd to multiarch ci, we need to resolve such issues.

cc Lin Wang

https://github.com/openshift/installer/pull/7567

Bug OCPBUGS-20364: [azure] missing instance type validation check under defaultMachinePlatform

View the Description View the linked PRs

Description of problem:

There is no instance type validation check under defaultMachinePlatform.
For example, set platform.azure.defaultMachinePlatform.type to Standard_D11_v2, which does not support PremiumIO, then create manifests:
 
# az vm list-skus --location southcentralus --size Standard_D11_v2 --query "[].capabilities[?name=='PremiumIO'].value" -otsv
False

install-config.yaml:
-------------------
platform:
  azure:
    defaultMachinePlatform:
      type: Standard_D11_v2
    baseDomainResourceGroupName: os4-common
    cloudName: AzurePublicCloud
    outboundType: Loadbalancer
    region: southcentralus

succeeded to create manifests:
$ ./openshift-install create manifests --dir ipi
INFO Credentials loaded from file "/home/fedora/.azure/osServicePrincipal.json" 
INFO Consuming Install Config from target directory 
INFO Manifests created in: ipi/manifests and ipi/openshift 

while get expected error when setting type under compute:
$ ./openshift-install create manifests --dir ipi
INFO Credentials loaded from file "/home/fedora/.azure/osServicePrincipal.json" 
ERROR failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: compute[0].platform.azure.osDisk.diskType: Invalid value: "Premium_LRS": PremiumIO not supported for instance type Standard_D11_v2

same situation for field vmNetworkingType under defaultMachinePlatform, instance type Standard_B4ms does not support Accelerated networking.
# az vm list-skus --location southcentralus --size Standard_B4ms --query "[].capabilities[?name=='AcceleratedNetworkingEnabled'].value" -otsv
False

install-config.yaml
----------------
platform:
  azure:
    defaultMachinePlatform:
      type: Standard_B4ms
      vmNetworkingType: "Accelerated" 

install still succeeds to create manifests file, should exit with error when type and vmNetworkingType setting under compute.
ERROR failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: compute[0].platform.azure.vmNetworkingType: Invalid value: "Accelerated": vm networking type is not supported for instance type Standard_B4ms

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-08-220853

How reproducible:

always on all supported version

Steps to Reproduce:

1. configure invalid instance type ( e.g unsupported PremiumIO) under defaultMachinePlatform in install-config.yaml
2. create manifests
3.

Actual results:

installer creates manifests successfully.

Expected results:

installer should exit with error, and have similar behavior when invalid instance type is configured under compute and controlPlane.

Additional info:

https://github.com/openshift/installer/pull/7584

Bug OCPBUGS-20369: worker CSR are pending, so no worker nodes available

View the Description View the linked PRs

Description of problem:

worker CSR are pending, so no worker nodes available

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-06-234925

How reproducible:

Always

Steps to Reproduce:

Create a cluster with profile - aws-c2s-ipi-disconnected-private-fips

Actual results:

Workers csrs are pending

Expected results:

workers should be up and running all CSRs approved

Additional info:

failed to find machine for node ip-10-143-1-120” , in logs of cluster-machine-approver 

Seems like we should have ips like 
“ip-10-143-1-120.ec2.internal”

failing here - https://github.com/openshift/cluster-machine-approver/blob/master/pkg/controller/csr_check.go#L263

Must-gather - https://drive.google.com/file/d/15tz9TLdTXrH6bSBSfhlIJ1l_nzeFE1R3/view?usp=sharing

cluster - https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/238922/

cc Yunfei Jiang Zhaohua Sun

https://github.com/openshift/machine-config-operator/pull/3979

Bug OCPBUGS-20481: Multi-vcenter and wrong user/password in secret/vmware-vsphere-cloud-credentials causes the vSphere CSI Driver controller pods restarting

View the Description View the linked PRs

Multi-vcenter and wrong user/password in secret/vmware-vsphere-cloud-credentials causes the vSphere CSI Driver controller pods restarting

Description of problem:
When there are Multi-vcenter in secret/vmware-vsphere-cloud-credentials in ns/openshift-cluster-csi-drivers (see bug https://issues.redhat.com/browse/OCPBUGS-20478), the vSphere CSI Driver controller pods restarting are always restarting.

vmware-vsphere-csi-driver-controller-545dc5679f-mdsjt   0/13    Pending             0             0s
vmware-vsphere-csi-driver-controller-545dc5679f-mdsjt   0/13    ContainerCreating   0             0s
vmware-vsphere-csi-driver-controller-587f78b9c7-br4gs   0/13    Terminating         0             3s
vmware-vsphere-csi-driver-controller-545dc5679f-mdsjt   0/13    Terminating         0             1s
vmware-vsphere-csi-driver-controller-587f78b9c7-9pfmp   0/13    Pending             0             0s
vmware-vsphere-csi-driver-controller-587f78b9c7-9pfmp   0/13    Pending             0             0s
vmware-vsphere-csi-driver-controller-587f78b9c7-9pfmp   0/13    ContainerCreating   0             0s
vmware-vsphere-csi-driver-controller-587f78b9c7-qdb89   12/13   Terminating         0             9s
vmware-vsphere-csi-driver-controller-b946b657-7t74p     13/13   Terminating         0             9s
vmware-vsphere-csi-driver-controller-545dc5679f-mdsjt   0/13    Terminating         0             3s
vmware-vsphere-csi-driver-controller-587f78b9c7-qdb89   0/13    Terminating         0             10s
vmware-vsphere-csi-driver-controller-545dc5679f-75wfm   12/13   Terminating         0             9s
vmware-vsphere-csi-driver-controller-587f78b9c7-9pfmp   0/13    ContainerCreating   0             2s
vmware-vsphere-csi-driver-controller-587f78b9c7-qdb89   0/13    Terminating         0             11s
vmware-vsphere-csi-driver-controller-587f78b9c7-qdb89   0/13    Terminating         0             11s
vmware-vsphere-csi-driver-controller-587f78b9c7-qdb89   0/13    Terminating         0             11s
vmware-vsphere-csi-driver-controller-545dc5679f-75wfm   0/13    Terminating         0             10s
vmware-vsphere-csi-driver-controller-545dc5679f-75wfm   0/13    Terminating         0             11s
vmware-vsphere-csi-driver-controller-545dc5679f-75wfm   0/13    Terminating         0             11s
vmware-vsphere-csi-driver-controller-545dc5679f-75wfm   0/13    Terminating         0             11s

$ oc get co storage
storage                                    4.14.0-0.nightly-2023-10-10-084534   False       True          False      15s     VSphereCSIDriverOperatorCRAvailable: VMwareVSphereDriverControllerServiceControllerAvailable: Waiting for Deployment

$ oc logs -f deployment.apps/vmware-vsphere-csi-driver-controller --tail=500
{"level":"error","time":"2023-10-12T11:40:38.920487342Z","caller":"service/driver.go:189","msg":"failed to init controller. Error: ServerFaultCode: Cannot complete login due to an incorrect user name or password.","TraceId":"5e60e6c5-efeb-4080-888c-74182e4fb1f4","TraceId":"ec636d3d-1ddb-43a5-b9f7-8541dacff583","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).BeforeServe\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/csi/service/driver.go:189\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).Run\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/csi/service/driver.go:202\nmain.main\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/cmd/vsphere-csi/main.go:71\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:250"}
{"level":"info","time":"2023-10-12T11:40:38.920536779Z","caller":"service/driver.go:109","msg":"Configured: \"csi.vsphere.vmware.com\" with clusterFlavor: \"VANILLA\" and mode: \"controller\"","TraceId":"5e60e6c5-efeb-4080-888c-74182e4fb1f4","TraceId":"ec636d3d-1ddb-43a5-b9f7-8541dacff583"}
{"level":"error","time":"2023-10-12T11:40:38.920572294Z","caller":"service/driver.go:203","msg":"failed to run the driver. Err: +ServerFaultCode: Cannot complete login due to an incorrect user name or password.","TraceId":"5e60e6c5-efeb-4080-888c-74182e4fb1f4","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).Run\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/csi/service/driver.go:203\nmain.main\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/cmd/vsphere-csi/main.go:71\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:250"}

$ oc logs vmware-vsphere-csi-driver-operator-b4b8d5d56-f76pc
I1012 11:43:08.973130       1 event.go:298] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-cluster-csi-drivers", Name:"vmware-vsphere-csi-driver-operator", UID:"a8492b8c-8c13-4b15-aedc-6f3ced80618e", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'DeploymentUpdateFailed' Failed to update Deployment.apps/vmware-vsphere-csi-driver-controller -n openshift-cluster-csi-drivers: Operation cannot be fulfilled on deployments.apps "vmware-vsphere-csi-driver-controller": the object has been modified; please apply your changes to the latest version and try again
E1012 11:43:08.996554       1 base_controller.go:268] VMwareVSphereDriverControllerServiceController reconciliation failed: Operation cannot be fulfilled on deployments.apps "vmware-vsphere-csi-driver-controller": the object has been modified; please apply your changes to the latest version and try again
W1012 11:43:08.999148       1 driver_starter.go:206] CSI driver can only connect to one vcenter, more than 1 set of credentials found for CSI driver
W1012 11:43:09.390489       1 driver_starter.go:206] CSI driver can only connect to one vcenter, more than 1 set of credentials found for CSI driver

Version-Release number of selected component (if applicable):
4.14.0-0.nightly-2023-10-10-084534

How reproducible:
Always

Steps to Reproduce:
See Description

Actual results:
Storage CSI Driver pods are restarting

Expected results:
Storage CSI Driver pods should not restarting

Bug OCPBUGS-25210: PipelineRuns is not loaded on repository details page

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13439

Task HOSTEDCP-1319: Fix Dependabot

View the Description View the linked PRs

Dependabot is not updating dependencies. Investigate & fix.

https://github.com/openshift/hypershift/pull/3246

Bug OCPBUGS-17841: GCP SNO installation fails because redirect ipt doesn't take effect on SGW

View the Description View the linked PRs

I tried upgrading a 4.14 SNO cluster from one nightly image to another and, while on AWS the upgrade works fine, it fails on GCP.

Cluster Network Operator successfully upgrades ovn-kubernetes, but is stuck on cloud network config controller, which is on crash loop back off state because it receives a wrong IP address from the name server when trying to reach the API server. The node IP is actually 10.0.0.3 and the name server returns 10.0.0.2, which I suspect is the bootstrap node IP, but that's only my guess.

Some relevant logs:

$ oc get co network
network                                    4.14.0-0.nightly-2023-08-15-200133   True        True          False      86m     Deployment "/openshift-cloud-network-config-controller/cloud-network-config-controller" is not available (awaiting 1 nodes)

$ oc get pods -n openshift-ovn-kubernetes -o wide
NAME                                     READY   STATUS    RESTARTS       AGE   IP         NODE                                 NOMINATED NODE   READINESS GATES ovnkube-control-plane-844c8f76fb-q4tvp   2/2     Running   3              24m   10.0.0.3   ci-ln-rij2p1b-72292-xmzf4-master-0   <none>           <none> ovnkube-node-24kb7                       10/10   Running   12 (13m ago)   25m   10.0.0.3   ci-ln-rij2p1b-72292-xmzf4-master-0   <none>           <none>

$ oc get pods -n openshift-cloud-network-config-controller -o wide
openshift-cloud-network-config-controller          cloud-network-config-controller-d65ccbc5b-dnt69               0/1     CrashLoopBackOff   15 (2m37s ago)   40m    10.128.0.141   ci-ln-rij2p1b-72292-xmzf4-master-0   <none>           <none>

$ oc logs -n openshift-cloud-network-config-controller          cloud-network-config-controller-d65ccbc5b-dnt69  W0816 11:06:00.666825       1 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work. F0816 11:06:30.673952       1 main.go:345] Error building controller runtime client: Get "https://api-int.ci-ln-rij2p1b-72292.gcp-2.ci.openshift.org:6443/api?timeout=32s": dial tcp 10.0.0.2:6443: i/o timeout

I also get 10.0.0.2 if I run a DNS query from the node itself or from a pod:

dig api-int.ci-ln-zp7dbyt-72292.gcp-2.ci.openshift.org
...
;; ANSWER SECTION:
api-int.ci-ln-zp7dbyt-72292.gcp-2.ci.openshift.org. 60 IN A 10.0.0.2

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always.

Steps to Reproduce:

1.on clusterbot: launch 4.14 gcp,single-node
2. on a terminal: oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release:4.14.0-0.nightly-2023-08-15-200133 --allow-explicit-upgrade --force

Actual results:

name server returns 10.0.0.2, so CNCC fails to reach the API server

Expected results:

name server should return 10.0.0.3

Must-gather: https://drive.google.com/file/d/1MDbsMgIQz7dE6e76z4ad95dwaxbSNrJM/view?usp=sharing

I'm assigning this bug first to the network edge team for a first pass. Please do reassign it if necessary.

https://github.com/openshift/machine-config-operator/pull/3953

Bug OCPBUGS-18707: Inadvertent peering of alertmanager instances during upgrade

View the Description View the linked PRs

Description of problem:

Cluster and userworkload alertmanager instances inadvertenly become peered during upgrade

Version-Release number of selected component (if applicable):

How reproducible:

infrequently - customer observed this on 3 cluster out of 15

Steps to Reproduce:

Deploy userworkload monitoring 

~~~
 config.yaml: |
    enableUserWorkload: true
    prometheusK8s:
~~~

Deploy user workload alertmanager  

~~~
  name: user-workload-monitoring-config
  namespace: openshift-user-workload-monitoring
data:
  config.yaml: |
    alertmanager:
      enabled: true 
~~~

upgrade the cluster
verify the state of the alertmanager clusters: 

~~~
$ oc exec -n openshift-monitoring alertmanager-main-0 -- amtool cluster show -o json --alertmanager.url=http://localhost:9093
~~~

Actual results:

alertmanager show 4 peers

Expected results:

we should have 2 pairs

Additional info:

Mitigation steps: 

Scaling down one of the alertmanager statefulsets to 0 and then scaling up again restores the expected configuration (i.e. 2 separate alertmanager clusters)

- the customer then added networkpolicies to prevent alertmanager gossip between namespaces.

https://github.com/openshift/prometheus-operator/pull/255

Bug OCPBUGS-17203: Mock apis of git repo for "test serverless function" tests

View the Description View the linked PRs

Description of problem:

Getting rate limit issue and other failures while running "test serverless function" tests

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13064

Bug OCPBUGS-19202: Update 4.15 ose-route-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/route-controller-manager/pull/30

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/route-controller-manager/pull/30

Bug OCPBUGS-19377: Upgrade from OpenShift 4.13 to 4.14 Leaves Network Operator Degraded

View the Description View the linked PRs

Description of problem:

After upgrading from OpenShift 4.13 to 4.14 with Kuryr network type, the network operator shows as Degraded and the cluster version reports that it's unable to apply the 4.14 update. The issue seems to be related to mtu settings, as indicated by the message: "Not applying unsafe configuration change: invalid configuration: [cannot change mtu for the Pods Network]."

Version-Release number of selected component (if applicable):

Upgrading from 4.13 to 4.14
4.14.0-0.nightly-2023-09-15-233408
Kuryr network type
RHOS-17.1-RHEL-9-20230907.n.1

How reproducible:

Consistently reproducible on attempting to upgrade from 4.13 to 4.14.

Steps to Reproduce:

1.Install OpenShift version 4.13 on OpenStack. 
2.Initiate an upgrade to OpenShift version 4.14.

Actual results:

The network operator shows as Degraded with the message:

network                                    4.13.13                              True        False         True       13h     Not applying unsafe configuration change: invalid configuration: [cannot change mtu for the Pods Network]. Use 'oc edit network.operator.openshift.io cluster' to undo the change.
 
Additionally, "oc get clusterversions" shows:

Unable to apply 4.14.0-0.nightly-2023-09-15-233408: wait has exceeded 40 minutes for these operators: network

Expected results:

The upgrade should complete successfully without any operator being degraded.

Additional info:

Some components remain at version 4.13.13 despite the upgrade attempt. Specifically, the dns, machine-config, and network operators are still at version 4.13.13. :

$ oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE                                                                                                         
authentication                             4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
baremetal                                  4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
cloud-controller-manager                   4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
cloud-credential                           4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
cluster-autoscaler                         4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
config-operator                            4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
console                                    4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
control-plane-machine-set                  4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
csi-snapshot-controller                    4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
dns                                        4.13.13                              True        False         False      13h                                                                                                                     
etcd                                       4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
image-registry                             4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
ingress                                    4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
insights                                   4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
kube-apiserver                             4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
kube-controller-manager                    4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
kube-scheduler                             4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
kube-storage-version-migrator              4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
machine-api                                4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
machine-approver                           4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
machine-config                             4.13.13                              True        False         False      13h                                                                                                                     
marketplace                                4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
monitoring                                 4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
network                                    4.13.13                              True        False         True       13h     Not applying unsafe configuration change: invalid configuration: [cannot change mtu for the Pods Network]. Use 'oc edit network.operator.openshift.io cluster' to undo the change.
node-tuning                                4.14.0-0.nightly-2023-09-15-233408   True        False         False      12h                                                                                                                     
openshift-apiserver                        4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
openshift-controller-manager               4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
openshift-samples                          4.14.0-0.nightly-2023-09-15-233408   True        False         False      12h                                                                                                                     
operator-lifecycle-manager                 4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
operator-lifecycle-manager-catalog         4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
operator-lifecycle-manager-packageserver   4.14.0-0.nightly-2023-09-15-233408   True        False         False      12h                                                                                                                     
service-ca                                 4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
storage                                    4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h

https://github.com/openshift/cluster-network-operator/pull/2007

Task OPRUN-3075: Downstream Sync for operator-controller v0.7.0

View the Description View the linked PRs

Bring the downstream operator-controller repo up-to-date with the v0.7.0 upstream release.

https://github.com/openshift/operator-framework-operator-controller/pull/31

Bug OCPBUGS-19186: Update 4.15 ose-image-customization-controller image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/image-customization-controller/pull/99

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/image-customization-controller/pull/99

Bug OCPBUGS-19212: Update 4.15 csi-driver-nfs image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-driver-nfs/pull/129

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-driver-nfs/pull/129

Bug OCPBUGS-22054: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-ibmcloud/pull/62

Bug OCPBUGS-22691: error when adding sriov pods to multus-cni-network

View the Description View the linked PRs

Description of problem:

When trying to create sriov pods, pods are stuck in state ContainerCreating.

pod definition:

apiVersion: v1                                                                                                                                                                                          
kind: Pod                                                                                                                                                                                               
metadata:                                                                                                                                                                                               
  name: test-sriov-pod                                                                                                                                                                                  
  namespace: default                                                                                                                                                                                    
  annotations:                                                                                                                                                                                          
    v1.multus-cni.io/default-network: default/ftnetattach                                                                                                                                               
  labels:                                                                                                                                                                                               
    pod-name: ft-iperf-server-pod-v4                                                                                                                                                                    
spec:                                                                                                                                                                                                   
  containers:
  - name: ft-iperf-server-pod-v4
    image: quay.io/wizhao/ft-base-image:0.8-x86_64

net-attach-def:

apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  annotations:
    k8s.v1.cni.cncf.io/resourceName: openshift.io/mlxnics
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"k8s.cni.cncf.io/v1","kind":"NetworkAttachmentDefinition","metadata":{"annotations":{"k8s.v1.cni.cncf.io/resourceName":"openshift.io/mlxnics"},"name":"ftnetattach","namespace":"default"},"spec":{"config":"{\"cniVersion\":\"0.3.1\",\"name\":\"ftnetattach\",\"type\":\"ovn-k8s-cni-overlay\",\"logFile\":\"/var/log/ovn-kubernetes/flowtest.log\",\"logLevel\":\"4\",\"ipam\":{},\"dns\":{}}"}}
  creationTimestamp: "2023-10-27T20:59:38Z"
  generation: 1
  name: ftnetattach
  namespace: default
  resourceVersion: "241792"
  uid: c394f8bc-20bc-4d0f-b5ce-9f5baad7c3de
spec:
  config: '{"cniVersion":"0.3.1","name":"ftnetattach","type":"ovn-k8s-cni-overlay","logFile":"/var/log/ovn-kubernetes/flowtest.log","logLevel":"4","ipam":{},"dns":{}}'

From a bisect of when this error started occurring, it appears this error was triggered with this change: https://github.com/ovn-org/ovn-kubernetes/pull/3958

Version-Release number of selected component (if applicable):

How reproducible:

Everytime

Steps to Reproduce:

1. Deploy sriov network operator 
2. Apply ovn-k8s-cni-overlay net-attach-def
3. Create pod

Actual results:

[]# oc get pod test-sriov-pod                                                                                                                        
NAME             READY   STATUS              RESTARTS   AGE
test-sriov-pod   0/1     ContainerCreating   0          2d18h

[] oc describe pod test-sriov-pod
<....>

  Warning  FailedCreatePodSandBox  36s (x18366 over 2d18h)  kubelet  (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox
k8s_test-sriov-pod_default_12194f6e-96ea-4255-be89-a05c57e7d85b_0(cfd3586aa90898cb4197f9c659b80f9e50989fc847e7722a529d137d450a9feb): error adding pod default_test-sriov-pod to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: '&{ContainerID:cfd3586aa90898cb4197f9c659b80f9e50989fc847e7722a529d137d450a9feb Netns:/var/run/netns/58ad326c-68fe-487a-b449-ff1e0d9bbb64 IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=test-sriov-pod;K8S_POD_INFRA_CONTAINER_ID=cfd3586aa90898cb4197f9c659b80f9e50989fc847e7722a529d137d450a9feb;K8S_POD_UID=12194f6e-96ea-4255-be89-a05c57e7d85b Path: StdinData:[123 34 98 105 110 68 105 114 34 58 34 47 118 97 114 47 108 105 98 47 99 110 105 47 98 105 110
34 44 34 99 104 114 111 111 116 68 105 114 34 58 34 47 104 111 115 116 114 111 111 116 34 44 34 99 108 117 115 116 101 114 78 101 116 119 111 114 107 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 99 110 105 47 110 101 116 46 100 47 49 48 45 111 118 110 45 107 117 98 101 114 110 101 116 101 115 46 99 111 110 102 34 44 34 99 110 105 67 111 110 102 105 103 68 105 114 34 58 34 47 104 111 115 116 47 101 116 99 47 99 110 105 47 110 101 116 46 100 34 44 34 99 110 105 86 101 114 115 105 111 110 34 58 34 48 46 51 46 49 34 44 34 100 97 101 109 111 110 83 111 99 107 101 116 68 105 114 34 58 34 47 114 117 110 47 109 117 108 116 117 115 47 115 111 99 107 101 116 34 44 34 103 108 111 98 97 108 78 97 109 101 115 112 97 99 101 115 34 58 34 100 101 102 97 117 108 116 44 111 112 101 110 115 104 105 102 116 45 109 117 108 116 117 115 44 111 112 101 110 115 104 105 102 116 45 115 114 105 111 118 45 110 101 116 119 111 114 107 45 111 112 101 114 97 116 111 114 34 44 34 108 111 103 76
101 118 101 108 34 58 34 118 101 114 98 111 115 101 34 44 34 108 111 103 84 111 83 116 100 101 114 114 34 58 116 114 117 101 44 34 109 117 108 116 117 115 65 117 116 111 99 111 110 102 105 103 68 105 114 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 99 110 105 47 110 101 116 46 100 34 44 34 109 117 108 116 117 115 67 111 110 102 105 103 70 105 108 101 34 58 34 97 117 116 111 34 44 34 110 97 109 101 34 58 34 109 117 108 116 117 115 45 99 110 105 45 110 101 116 119 111 114 107 34 44 34 110 97 109 101 115 112 97 99 101 73 115 111 108 97 116 105 111 110 34 58 116 114 117 101 44 34 112 101 114 78 111 100 101 67 101 114 116 105 102 105 99 97 116 101 34 58 123 34 98 111 111 116 115 116 114 97 112 75 117 98 101 99 111 110 102 105 103 34 58 34 47 118 97 114 47 108 105 98 47 107 117 98 101 108 101 116 47 107 117 98 101 99 111 110 102 105 103 34 44 34 99 101 114 116 68 105 114 34 58 34 47 101 116 99 47 99 110 105 47 109 117 108 116 117 115 47 99 101 114 116 115 34 44 34 99 101 114 116 68 117 114 97 116 105 111 110 34 58 34 50 52 104 34 44 34 101 110 97 98 108 101 100 34 58 116 114 117 101 125 44 34 115 111 99 107 101 116 68 105 114 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 115 111 99 107 101 116 34 44 34 116 121 112 101 34 58 34 109 117 108 116 117 115 45 115 104 105 109 34 125]} ContainerID:"cfd3586aa90898cb4197f9c659b80f9e50989fc847e7722a529d137d450a9feb" Netns:"/var/run/netns/58ad326c-68fe-487a-b449-ff1e0d9bbb64" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=test-sriov-pod;K8S_POD_INFRA_CONTAINER_ID=cfd3586aa90898cb4197f9c659b80f9e50989fc847e7722a529d137d450a9feb;K8S_POD_UID=12194f6e-96ea-4255-be89-a05c57e7d85b" Path:"" ERRORED: error configuring pod [default/test-sriov-pod] networking: [default/test-sriov-pod/12194f6e-96ea-4255-be89-a05c57e7d85b:ftnetattach]: error adding container to network "ftnetattach": failed to send CNI request: Post "http://dummy/": EOF

Expected results:

pod is created and allocated device

Additional info:

Red Hat CoreOS: 414.92.202310270216-0
Cluster version: 4.14.0-0.nightly-multi-2023-10-27-070855

https://github.com/openshift/ovn-kubernetes/pull/1952

Bug OCPBUGS-23164: Console: Cannot Edit Shipwright Build

View the Description View the linked PRs

Description of problem:


Unable to edit Shipwright Builds with the upcoming builds for Red Hat OpenShift release (based on Shipwright v0.12.0) in the developer and admin consoles.

Workaround is to use `oc edit build.shipwright.io ...`

Version-Release number of selected component (if applicable):


OCP 4.14
builds for OpenShift v1.0.0

How reproducible:


Always

Steps to Reproduce:


1. Deploy the builds for Red Hat OpenShift release candidate operator
2. Create a Build using the shp command line: `shp build create ...`
3. Open the Dev or Admin console for Shipwright Builds
4. Attempt to edit the Build object

Actual results:


Page appears to "freeze", does not let you edit.

Expected results:


Shipwright Build objects can be edited.

Additional info:


Can be reproduced by deploying the following "test catalog" - quay.io/adambkaplan/shipwright-io/operator-catalog:v0.13.0-rc7, then creating a subscription for the Shipwright operator.

Will likely be easier to reproduce once we have the downstream operator in the Red Hat OperatorHub catalog.

https://github.com/openshift/console/pull/13341

Bug OCPBUGS-16871: MCO - currentConfig missing on the filesystem

View the Description View the linked PRs

Description of problem:

If the `currentConfig` is removed from the master node, the Machine Config Daemon will not recreate it. 

The logs will say:
~~~
W0726 23:57:35.890645 3013426 daemon.go:1097] Got an error from auxiliary tools: could not get current config from disk: open /etc/machine-config-daemon/currentconfig: no such file or directory
~~~

However, the MCD won't create that currentconfig.
Is this desired state?

The workaround is to create the correct annotation

Version-Release number of selected component (if applicable):

OpenShift 4.12 and tested on 4.13

How reproducible:

- remove the currentConfig from the node
- check the status of the MCD

Steps to Reproduce:

1.
2.
3.

Actual results:

- the currentconfig is missing - stopping the MCD

Expected results:

- if the currentconfig is missing, MCD should reconcile based on the desiredconfig label of the node

Additional info:

https://github.com/openshift/machine-config-operator/pull/3963

Bug OCPBUGS-26005: Bump to kubernetes 1.28.5

View the Description View the linked PRs

This fix contains the following changes coming from updated version of kubernetes up to v1.28.5:

Changelog:
v1.28.4: https://github.com/kubernetes/kubernetes/blob/release-1.28/CHANGELOG/CHANGELOG-1.28.md#changelog-since-v1284

https://github.com/openshift/kubernetes/pull/1837

Story MGMT-16291: Default values with broken OCP version

View the linked PRs

https://github.com/openshift/assisted-service/pull/5737

Bug OCPBUGS-22562: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-file-csi-driver-operator/pull/82

Bug OCPBUGS-24142: Update 4.15 ose-kube-storage-version-migrator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kubernetes-kube-storage-version-migrator/pull/201

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kubernetes-kube-storage-version-migrator/pull/201

Bug OCPBUGS-24290: Generate a report about certificates violating metadata requirements

View the linked PRs

https://github.com/openshift/origin/pull/28432

Bug OCPBUGS-19783: Channel page shows "Required" message for the default name when navigate to create channel page

View the Description View the linked PRs

Description of problem:

When navigating to create Channel page from add or topology, the default name as "channel" is present but still the Create button is disabled with "Required" showing under the name field

Version-Release number of selected component (if applicable):

4.12.0-0.nightly-2023-09-26-042251

How reproducible:

Always

Steps to Reproduce:

1. Install serverless operator
2. Go to Add page in developer perspective
3. Click on the channel card

Actual results:

The create button is disabled with an error showing "Required" under the name field but the name field contains the default name as "channel"

Expected results:

The create button should be active

Additional info:

If you switch to yaml view the create button becomes active and if you switch back to form view the create button is still active

https://github.com/openshift/console/pull/13222

Bug OCPBUGS-24658: [release-4.15] Observer -> Alerting, Metrics and Targets page does not load

View the Description View the linked PRs

Description of problem:

    Observer - Alerting, Metrics, and Targets page does not load as expected, blank page would be shown

Version-Release number of selected component (if applicable):

    4.15.0-0.nightly-2023-12-07-041003

How reproducible:

    Always

Steps to Reproduce:

    1.Navigate to Observer -> Alerting, Metrics, and Targets page directly
    2.
    3.

Actual results:

    Blank page, no data be loaded

Expected results:

    Work as normal

Additional info:

 Failed to load resource: the server responded with a status of 404 (Not Found)
/api/accounts_mgmt/v1/subscriptions?page=1&search=external_cluster_id%3D%2715ace915-53d3-4455-b7e3-b7a5a4796b5c%27:1

Failed to load resource: the server responded with a status of 403 (Forbidden)
main-chunk-bb9ed989a7f7c65da39a.min.js:1 API call to get support level has failed r: Access denied due to cluster policy.
    at https://console-openshift-console.apps.ci-ln-9fl1l5t-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-bb9ed989a7f7c65da39a.min.js:1:95279
(anonymous) @ main-chunk-bb9ed989a7f7c65da39a.min.js:1
/api/kubernetes/apis/operators.coreos.com/v1alpha1/namespaces/#ALL_NS#/clusterserviceversions?:1
        
        
       Failed to load resource: the server responded with a status of 404 (Not Found)
vendor-patternfly-5~main-chunk-95cb256d9fa7738d2c46.min.js:1 Modal: When using hasNoBodyWrapper or setting a custom header, ensure you assign an accessible name to the the modal container with aria-label or aria-labelledby.

https://github.com/openshift/monitoring-plugin/pull/85

Bug OCPBUGS-25440: [AWS] iam:TagInstanceProfile permission is required for ipi install

View the Description View the linked PRs

Description of problem:

iam:TagInstanceProfile is not listed in official document [1], IPI install would fail if iam:TagInstanceProfile permission is missing

level=error msg=Error: creating IAM Instance Profile (ci-op-4hw2rz1v-49c30-zt9vx-worker-profile): AccessDenied: User: arn:aws:iam::301721915996:user/ci-op-4hw2rz1v-49c30-minimal-perm is not authorized to perform: iam:TagInstanceProfile on resource: arn:aws:iam::301721915996:instance-profile/ci-op-4hw2rz1v-49c30-zt9vx-worker-profile because no identity-based policy allows the iam:TagInstanceProfile action
level=error msg=    status code: 403, request id: bb0641f5-d01c-4538-b333-261a804ddb59

[1] https://docs.openshift.com/container-platform/4.14/installing/installing_aws/installing-aws-account.html#installation-aws-permissions_installing-aws-account

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-12-14-115151

How reproducible:

Always

Steps to Reproduce:

    1. install a common IPI cluster with minimal permission provided in official document
    2.
    3.

Actual results:

Install failed.

Expected results:

Additional info:

install does a precheck for iam:TagInstanceProfile

https://github.com/openshift/installer/pull/7843

Bug OCPBUGS-19250: Update 4.15 ose-cluster-machine-approver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-machine-approver/pull/201

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-machine-approver/pull/201

Requirements	Notes	IS MVP
Discover new offerings in Home Dashboard		Y
Access details outlining value of offerings		Y
Access step-by-step guide to install offering		N
Allow developers to easily find and use newly installed offerings		Y
Support air-gapped clusters		Y

4.15.0-0.okd-scos-2024-01-18-103522

Changes from 4.14.0-0.okd-scos-2024-05-06-124404

Complete Features

Feature Overview (aka. Goal Summary)

Documentation Considerations

Feature Overview

Background, and strategic fit

Acceptance Criteria

Epic Goal

Why is this important?

Acceptance Criteria

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

User Story

Background

Steps

Stakeholders

Definition of Done

User Story

Background

Steps

Stakeholders

Definition of Done

User Story

Background

Steps

Stakeholders

Definition of Done

Epic Goal

Why is this important?

Upstream links

Acceptance Criteria

Dependencies (internal and external)

Done Checklist

Feature Overview (aka. Goal Summary)

Goals (aka. expected user outcomes)

Requirements (aka. Acceptance Criteria):

Use Cases (Optional):

Questions to Answer (Optional):

Out of Scope

Background

Customer Considerations

Documentation Considerations

Interoperability Considerations

Feature Overview (aka. Goal Summary)

Goals (aka. expected user outcomes)

Requirements (aka. Acceptance Criteria):

Questions to Answer (Optional):

Out of Scope

Background

Customer Considerations

Documentation Considerations

Interoperability Considerations

Feature Overview (aka. Goal Summary)

Goals (aka. expected user outcomes)

Use Cases (Optional):

Background

Documentation Considerations

Interoperability Considerations

Feature Goal

Why is this important?

Scenarios

Acceptance Criteria

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

Epic Goal

Why is this important?

Scenarios

Acceptance Criteria

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

Feature Overview

MVP: bring the off-cluster build environment on-cluster

Done when