When you start learning system design, the hardest part is not the concepts themselves.

It is about finding clear explanations in one place.

That is why having a single guide that covers all the essentials is such a game-changer.

Therefore, I’ve designed this guide to cover 50 of the most important system design concepts.

Think of it as your one-stop reference for understanding how real systems scale, stay reliable, communicate, and handle data.

My goal is to walk you through fifty important ideas using short explanations and simple examples so everything clicks quickly.

If you are preparing for a system design interview, this guide is your go-to resource.

I. Core Architecture Principles

Vertical vs Horizontal Scaling

Vertical scaling means upgrading a single machine, like adding more CPU, RAM, or faster storage.
Horizontal scaling means adding more machines and spreading work across them.

Vertical is easier but hits hardware limits and becomes expensive.

Horizontal is harder because you need load balancing, stateless services, and shared storage.

Think of it this way: vertical is one superhero getting stronger, horizontal is building a team.

CAP Theorem

CAP Theorem says that in the presence of a network partition, a distributed system must choose between Consistency and Availability. Consistency means every user sees the same data at the same time.

Availability means the system always responds, even if the data might be slightly stale.

You cannot have perfect consistency and perfect availability when your network is broken, so you decide which one to sacrifice for your use case.

PACELC Theorem

PACELC extends CAP and says: if there is a Partition, choose Availability or Consistency; Else choose Latency or Consistency.

Even when the network is fine, you still trade off slow but consistent reads vs fast but eventually consistent reads. Systems that sync across regions often pay in latency to keep strong consistency.

It explains why some databases are fast but slightly stale, while others are slower but always accurate.

ACID vs BASE

ACID is about strict, reliable transactions: Atomicity, Consistency, Isolation, Durability. It suits financial systems, inventory, and anything where mistakes are very costly.

BASE stands for Basically Available, Soft state, Eventual consistency and is used in large distributed systems that need to stay up and respond quickly.

BASE systems might show temporary inconsistencies but fix themselves over time.

In practice, many architectures combine both, using ACID for core money flows and BASE for things like feeds and analytics.

Throughput vs Latency

Throughput is how many requests your system can handle per second.
Latency is how long a single request takes from start to finish.

You can often increase throughput by doing more work in parallel, but that may increase latency if queues build up.

Think of a restaurant that takes many orders at once but makes customers wait longer. Good system design tries to balance both: enough throughput for peak load but low latency for a smooth user experience.

Amdahl’s Law

Amdahl’s Law says that the speedup from parallelization is limited by the part that cannot be parallelized.

If 20 percent of your system is always sequential, no amount of extra machines will fix that bottleneck.

Let me break it down.

If your request always has to hit a single master database, that master will cap your performance. This law reminds you to hunt for bottlenecks instead of just adding more servers.

Strong vs Eventual Consistency

Strong consistency means all users see the same data immediately after a write.
Eventual consistency means updates spread over time and nodes may briefly disagree.

Strong consistency is easier to reason about but usually slower and less available under failures.

Eventual consistency is great for large-scale systems like timelines or counters where perfect freshness is not critical.

The key is to choose the model that matches the user experience you need.

Stateful vs Stateless Architecture

A stateful service remembers user context between requests, often storing session data locally.
A stateless service treats every request as new, relying on external stores like caches or databases for any state.

Stateless services are easier to scale horizontally because any instance can handle any request.

Stateful systems can be simpler to code but harder to load balance and fail over.

In modern cloud systems, we try to push state into databases and keep services as stateless as possible.

Microservices vs Monoliths

A monolith is a single application that contains many features in one deployable unit.

Microservices split features into separate services that communicate over the network.

Microservices help teams work independently and scale different parts separately, but introduce complexity around communication, debugging, and data consistency.

Monoliths are simpler to start with and often fine up to a certain scale. Here is the tricky part.

Many great systems start as monoliths and gradually evolve into microservices when the pain is real.

Serverless Architecture

Serverless lets you run small functions in the cloud without managing servers directly. You pay only when your code runs, and the platform handles scaling and infrastructure for you.

It is ideal for event-driven workloads such as webhooks, background jobs, or light APIs with spiky traffic.

The tradeoff is less control over long-running tasks, cold starts, and sometimes a higher cost at very high volumes.

Think of serverless as “functions as a service,” perfect for glue code and lightweight services.

II. Networking and Communication

Load Balancing

Load balancing spreads incoming traffic across multiple servers so no single server gets overloaded. It improves both reliability and performance, since a single server’s failure does not bring down the entire system.

Load balancers can be hardware devices or software services. They often support health checks so they stop sending traffic to unhealthy instances.

From an interview point of view, they are your first building block when scaling horizontally.

Load Balancing Algorithms

Common load balancing algorithms include Round Robin, Least Connections, and IP Hash.

Round Robin cycles through servers in order and is simple to implement.
Least Connections sends traffic to the server with the fewest active connections, which helps when requests vary in length.
IP Hash uses a hash of the client IP so the same user usually goes to the same server, which helps with simple session stickiness.

Picking the right algorithm affects fairness, resource usage, and user experience.

Reverse Proxy vs Forward Proxy

A reverse proxy sits in front of servers and represents them to clients. It hides the inner topology, can do TLS termination, caching, compression, and routing.
A forward proxy sits in front of clients and represents them to the outside world, often for security, caching, or content filtering.

Think of a reverse proxy as the reception desk of a company that hides all the internal rooms, and a forward proxy as a gateway your laptop must pass through to reach the internet.

Knowing the difference helps when you talk about API gateways and corporate proxies.

API Gateway

An API gateway is a special reverse proxy that acts as a single entry point for all API calls in a microservices system. It handles routing to the right service, rate limiting, authentication, logging, and sometimes response shaping.

This reduces complexity on the client side, since clients only talk to a single endpoint.

If you put too much logic in the gateway, it can become a bottleneck or a mini monolith of its own. Good designs keep it focused and thin.

CDN (Content Delivery Network)

A CDN is a network of geographically distributed servers that cache static content like images, videos, and scripts closer to users.

When a user requests content, they are routed to the nearest CDN node, which greatly reduces latency. This also offloads traffic from your origin servers, improving scalability and resilience.

CDNs are essential for global applications and front-end performance.

Think of them as “local copies” of your website’s heavy files sprinkled around the world.

DNS (Domain Name System)

DNS maps human readable domain names to IP addresses.

When you type a website name, your device queries DNS to find the numeric address of the server.

has multiple layers of caching, so responses are fast after the first lookup. It can also be used to perform simple load balancing by returning different IPs for the same name.

Understanding DNS helps you reason about why name changes take time to propagate and why some outages are caused by misconfigured DNS.

TCP vs UDP

TCP is a reliable, connection-oriented protocol. It guarantees ordered, error checked delivery by using acknowledgments and retries.
UDP is connectionless and does not guarantee delivery or order, which makes it much faster and lighter.

TCP suits APIs, web pages, and file transfers where accuracy matters.

UDP works well for real time applications like video calls or games where occasional packet loss is acceptable.

Think of TCP as registered mail and UDP as quick postcards.

HTTP/2 and HTTP/3 (QUIC)

HTTP/2 introduced multiplexing, which lets multiple requests share a single TCP connection, reducing overhead. It also brought features like header compression and server push.
HTTP/3 runs over QUIC, which is built on UDP and improves connection setup time and performance on unreliable networks. These versions mainly aim to reduce latency and better use modern network conditions.

For you as an engineer, the key idea is: fewer connection setups and better use of a single connection.

gRPC vs REST

REST typically uses HTTP with JSON and focuses on resources like /users or /orders. It is simple, human-readable, and widely used for public APIs.
gRPC uses HTTP/2 and binary encoded messages (protobuf), which are smaller and faster over the wire. It also supports bidirectional streaming and strong typing.

In microservices, gRPC is often preferred for service-to-service calls, while REST is common for external clients.

Use REST when readability and compatibility matter, gRPC when performance and contracts matter.

WebSocket and Server-Sent Events (SSE)

WebSockets create a full-duplex connection where client and server can send messages to each other at any time.

SSE allows the server to push events to the client over a one way channel using HTTP.

WebSockets are great for chats, multiplayer games, and live collaboration.

SSE is simpler and fits cases like live score updates or notifications, where only the server needs to push updates.

Both solve real-time communication problems that plain HTTP cannot handle well.

Long Polling

Long polling is a technique where the client sends a request and the server holds it open until there is new data or a timeout.

When the response comes back, the client immediately opens another request. This simulates real time updates over plain HTTP without special protocols.

It is less efficient than WebSockets but easier to implement and works through most proxies and firewalls.

Think of it as asking “anything new?” and waiting quietly until there is an answer.

Gossip Protocol

A gossip protocol lets nodes in a distributed system share information by periodically talking to random peers.

Over time, information spreads like gossip in a social group until everyone has roughly the same view. It is used to share membership, health status, or configuration in a fault tolerant way.

The protocol is eventually consistent and does not rely on a central authority. This makes it ideal for large clusters where nodes frequently join and leave.

III. Database and Storage Internals

Sharding (Data Partitioning)

Sharding splits data across multiple machines, each holding a subset of the data. Common strategies include range-based sharding, hash-based sharding, and directory-based sharding.

The main goal is to scale storage and throughput by avoiding a single giant database node.

The tricky part is choosing a shard key that avoids hot spots where one shard gets most of the traffic. Once you shard, moving data between shards (resharding) becomes an important operational challenge.

Replication Patterns (Master Slave, Master Master)

Replication means keeping multiple copies of data on different nodes.

In master slave (or primary replica), one node handles writes and replicates changes to others that serve reads.
In master master (multi-primary), multiple nodes accept writes and reconcile conflicts.

Replication improves read performance and availability, but makes consistency harder, especially when writes go to multiple nodes.

In interviews, expect to talk about how replication lag affects reads and how failover works when a master dies.

Consistent Hashing

Consistent hashing is a technique to distribute keys across nodes in a way that minimizes data movement when nodes are added or removed.

Keys and nodes are placed on a logical ring, and each key belongs to the next node on the ring.

When a node joins or leaves, only a small portion of keys need to move. This property is very helpful in distributed caches and databases.

Think of it as a smooth mapping that does not get scrambled when the cluster size changes.

Database Indexing (B Trees, LSM Trees)

Indexes speed up queries by organizing data in a way that allows fast lookup.

B Trees are balanced trees that keep data sorted and let you find ranges efficiently, common in relational databases.

LSM Trees batch writes in memory and periodically flush them to disk, which makes writes very fast but reads more complex.

The tradeoff is write heavy vs read heavy workloads.

The key idea is that indexes are a separate structure that must be updated on every write, which is why too many indexes hurt insert performance.

Write Ahead Logging (WAL)

Write Ahead Logging records changes to a log before applying them to the main database.

If a crash happens in the middle of a transaction, the system can replay the log to restore a consistent state. WAL ensures durability and atomicity of transactions. It also allows techniques like replication from the log stream. Let me tell you why it is important.

Without WAL, a crash could leave your data in a half updated, corrupt state.

Normalization vs Denormalization

Normalization organizes data into tables that reduce redundancy and dependencies, following rules like first normal form, second normal form, and so on. This avoids anomalies on updates and inserts.
Denormalization intentionally duplicates data to speed up reads and reduce joins. In high scale systems, denormalization is common for read heavy paths, such as storing user names along with posts instead of joining every time.

The real skill is knowing where you can safely denormalize without breaking consistency.

Polyglot Persistence

Polyglot persistence means using multiple types of databases within the same system, each chosen for what it does best. You might use a relational database for transactions, a document store for logs, a key value store for caching, and a graph database for relationships.

Instead of forcing everything into one database, you pick the right tool for each job.

The tradeoff is more operational complexity and more knowledge required from the team.

Bloom Filters

A Bloom filter is a space efficient data structure that quickly answers “might this item be in the set?” with possible false positives but no false negatives. It uses multiple hash functions to set bits in a bit array when items are inserted.

To check membership, you test the same bits; if any bit is zero, the item is definitely not present.

Databases and caches use Bloom filters to avoid unnecessary disk lookups or cache misses.

Think of them as fast gatekeepers that say “definitely not” or “maybe.”

Vector Databases

Vector databases store and query vectors, which are numeric representations of data such as text, images, or audio. These vectors come from models like embeddings and allow similarity search, such as “find documents most similar to this one.”

Instead of exact equality comparisons, they use distance metrics like cosine similarity or Euclidean distance. This is essential for modern search, recommendation, and AI assistant systems.

In interviews, it is enough to know that vector databases support nearest neighbor search over high-dimensional data.

IV. Reliability and Fault Tolerance

Rate Limiting

Rate limiting controls how many requests a user, IP, or API key can make in a given time window. It protects your system from abuse, accidental traffic spikes, and runaway loops.

Common strategies include fixed window, sliding window, and token bucket.

Rate limits are often enforced at the API gateway or load balancer.

Think of them as safety brakes that keep shared resources from being overwhelmed.

Circuit Breaker Pattern

A circuit breaker monitors calls to a remote service and “opens” if there are too many failures.

When open, it immediately fails new requests instead of trying the broken service again.

After a cooldown period, it allows a few trial calls to see if the service has recovered and closes if they succeed. This pattern prevents cascading failures where one slow service drags down the entire system.

Here is the tricky part. Circuit breakers must be tuned carefully so they do not open too aggressively or too late.

Bulkhead Pattern

The bulkhead pattern isolates parts of a system so a failure in one area does not sink everything. This can mean separate connection pools, thread pools, or even entire service clusters for different features.

If one bulkhead is flooded with traffic, others keep working.

The name comes from ship bulkheads that contain flooding in one compartment.

In design discussions, using bulkheads shows you are thinking about fault isolation and blast radius.

Retry Patterns and Exponential Backoff

Retries help recover from transient errors like network timeouts or temporary overload.

Exponential backoff means each retry waits longer than the previous one, such as 1 second, 2 seconds, 4 seconds, and so on. This prevents your client from hammering a service that is already struggling.

Good retry policies also use jitter (small randomness) to avoid thundering herds.

Let me break it down.

Retries without backoff can make outages worse instead of helping.

Idempotency

An operation is idempotent if performing it multiple times has the same effect as performing it once.

For example, “set user status to active” is idempotent, while “increment account balance by 10” is not.

Idempotency is critical when systems use retries, because the same request may be sent more than once.

APIs often require idempotency keys on operations like payments to avoid double charging.

In interviews, always mention idempotency when you talk about at least once delivery or retries.

Heartbeat

A heartbeat is a periodic signal sent by a service or node to indicate that it is alive and healthy.

Monitoring systems or coordinators listen for heartbeats.

If they stop receiving them, they mark the node as down and trigger failover or scaling actions.

Heartbeats are simple but powerful tools for liveness detection. Think of them as the system’s “pulse checks.”

Leader Election (Paxos, Raft)

Leader election is the process of choosing a single node to act as a coordinator among many.

Algorithms like Paxos and Raft ensure that only one leader is chosen and that all nodes eventually agree on who that leader is.

The leader handles tasks like assigning work, managing metadata, or ordering writes. If the leader fails, a new one is elected automatically.

You do not need to memorize the math for interviews, but you should know that consensus algorithms power many critical systems like metadata stores and distributed logs.

Distributed Transactions (SAGA Pattern)

A distributed transaction spans multiple services or databases.

The SAGA pattern models such a transaction as a sequence of local steps with compensating actions for rollbacks.

Instead of locking everything like a single ACID transaction, each service performs its part and publishes an event. If something fails, compensating steps attempt to undo previous changes. This fits naturally with microservices and eventual consistency.

The tradeoff is more complex logic and the possibility of partial failures that must be handled gracefully.

Two Phase Commit (2PC)

Two Phase Commit is a protocol that tries to provide atomic transactions across multiple nodes.

In the first phase, the coordinator asks all participants if they can commit.
In the second phase, if everyone agrees, it tells them to commit; otherwise, it tells them to roll back.

2PC provides strong guarantees but can block if the coordinator fails, and it is expensive at scale due to locking.

In modern cloud systems, 2PC is often avoided for high throughput paths and replaced by patterns like SAGA.

V. Caching and Messaging

Caching

Caching stores frequently accessed data in a fast storage layer, usually memory, to reduce latency and backend load.

Common cache layers include in process caches, external key value stores, and CDNs. Caching is especially effective for read heavy workloads and expensive computations.

Here is the tricky part. Stale data and invalidation make caching harder than it first appears.

As the saying goes, cache invalidation is one of the hard problems in computer science.

Caching Strategies (Cache Aside, Write Through, etc.)

Cache aside means the application reads from the cache, and on a miss, loads from the database and writes to the cache.
Write through writes to the cache and database at the same time, ensuring cache and source are always in sync.
Write back writes to the cache first and flushes to the database later, which is fast but risky if the cache fails.

Each strategy balances freshness, complexity, and performance differently.

Interviewers love when you mention which strategy you would pick for a given scenario.

Cache Eviction Policies (LRU, LFU)

Cache eviction policies decide which items to remove when the cache is full.

LRU (Least Recently Used) evicts items that have not been accessed recently, assuming recent items are more likely to be used again.
LFU (Least Frequently Used) evicts items that are rarely accessed, focusing on long term popularity.

Some systems use random, FIFO, or advanced algorithms.

The key idea is that cache space is limited, so you want to keep the most valuable items in memory.

Message Queues (Point to Point)

A message queue allows one component to send messages to another without needing both to be online at the same time.

In a point to point model, messages in a queue are consumed by one receiver and then removed. This decouples sender and receiver so they can scale and fail independently.

Queues are great for background jobs, email sending, and processing heavy tasks asynchronously.

Think of them as a todo list shared between services.

Pub Sub (Publish Subscribe)

In pub sub, publishers send messages to topics, not directly to consumers.

Subscribers listen to topics they care about and receive copies of relevant messages. This enables broadcast style communication and loose coupling between producers and consumers.

Multiple services can react to the same event in different ways, such as logging, analytics, and notifications.

In interviews, pub sub often appears in event driven designs like activity feeds or event sourcing.

Dead Letter Queues

A dead letter queue stores messages that could not be processed successfully after several attempts.

Instead of retrying forever and blocking the main queue, these messages are moved aside.

Engineers can inspect the dead letter queue to debug issues, fix data, or replay messages later. This pattern improves resiliency and keeps your system from getting stuck on “poison messages.”

Think of it as a holding area for problematic jobs.

VI. Observability and Security

Distributed Tracing

Distributed tracing tracks a single request as it flows through multiple services. Each service adds a trace ID and span information so you can reconstruct the full path of a request. This is extremely helpful when debugging slow responses or failures in microservice architectures.

Without tracing, you just see errors in isolation. With it, you see the whole story across services, queues, and databases.

SLA vs SLO vs SLI

An SLA (Service Level Agreement) is an external promise to customers, such as “99.9 percent uptime per month.”

An SLO (Service Level Objective) is an internal target that engineers aim to meet, usually stricter than the SLA. An SLI (Service Level Indicator) is the actual measured metric, like real uptimes or request success rates.

Think of SLA as the contract, SLO as the goal, and SLI as the scoreboard.

In interviews, using these terms correctly shows maturity in thinking about reliability.

OAuth 2.0 and OIDC

OAuth 2.0 is a framework for delegated authorization. It lets users grant an application limited access to their resources without sharing passwords.

OIDC (OpenID Connect) builds on OAuth 2.0 to add authentication, letting clients verify who the user is and get user identity information. This is the basis of many “Login with X” flows.

The key idea is that an authorization server issues tokens that clients and APIs can trust.

TLS/SSL Handshake

TLS/SSL secures communication between client and server by encrypting data in transit.

During the handshake, the client and server agree on encryption algorithms, exchange keys securely, and verify certificates.

Once the handshake completes, all subsequent data is encrypted and safe from eavesdropping. This is what puts the little lock icon in your browser.

Without TLS, anyone on the network could read or modify sensitive traffic.

Zero Trust Security

Zero Trust is a security model that says: “Never trust, always verify.” It assumes that threats can exist both outside and inside the network.

Every request must be authenticated, authorized, and encrypted, even if it comes from within your data center or VPC. Access is granted based on identity, device posture, and context, not just on being “inside the firewall.”

In modern architectures, Zero Trust is becoming the default approach to secure system design.

Reference

Key Takeaways

System design is mostly about understanding trade-offs: consistency vs. availability, latency vs. throughput, simplicity vs. flexibility.
Scaling is not just “add more servers.” You must think about load balancing, sharding, replication, and bottlenecks.
Reliability patterns like rate limiting, circuit breakers, retries, and bulkheads exist because failures are normal in distributed systems.
Caching, queues, and pub-sub are your best friends for performance and decoupling, but they introduce new challenges around consistency and ordering.
Observability and security concepts such as tracing, SLIs, OAuth, TLS, and Zero Trust are essential for systems that are not just fast but also safe and debuggable.

Fonte: https://designgurus.substack.com/p/50-system-design-concepts-for-beginners

50 Core System Design Concepts

March 4, 2026

Luis Fernando Chaim

Arquitetura No Comments

Executive Summary

This document synthesizes 50 fundamental concepts in system design, drawing from a comprehensive guide on the subject. The core insight is that effective system design is an exercise in managing trade-offs, particularly between consistency and availability, latency and throughput, and simplicity versus flexibility. Successful scaling extends beyond merely adding servers; it necessitates a deep understanding of load balancing, data sharding, replication, and bottleneck identification.

Get Rishabh Maheshwari’s stories in your inbox

Join Medium for free to get updates from this writer.

Reliability in distributed systems is not an accident but a deliberate architectural choice, achieved through patterns like rate limiting, circuit breakers, retries, and bulkheads, which are designed to handle expected failures gracefully. Performance and decoupling are significantly enhanced by tools such as caching, message queues, and publish-subscribe models, though these introduce their own complexities regarding data consistency and message ordering. Finally, modern systems must be built with observability and security as primary concerns, incorporating distributed tracing, service level indicators (SLIs), robust authentication (OAuth/OIDC), data-in-transit encryption (TLS), and a Zero Trust security posture to ensure they are not only performant but also safe, secure, and debuggable.

I. Core Architecture Principles

This section outlines the foundational principles and architectural choices that govern how systems are structured, scaled, and managed.

Vertical vs. Horizontal Scaling

Vertical Scaling: Involves upgrading a single machine by adding more CPU, RAM, or faster storage. It is simpler to implement but is constrained by hardware limits and becomes progressively more expensive. The analogy provided is a single superhero getting stronger.
Horizontal Scaling: Involves adding more machines and distributing the workload across them. While more complex, requiring load balancing, stateless services, and shared storage, it offers greater scalability. The analogy is building a team of superheroes.

CAP Theorem

The CAP Theorem states that in a distributed system experiencing a network partition, it is impossible to simultaneously guarantee both Consistency and Availability.
Consistency: Every user sees the same data at the same time.
Availability: The system always provides a response, even if the data may be temporarily out of date.
A system must choose which of these two guarantees to sacrifice during a network failure.

PACELC Theorem

PACELC is an extension of the CAP theorem. It posits that: if there is a Partition, a system must choose between Availability and Consistency; Else (in normal operation), it must choose between Latency and Consistency.
This theorem clarifies that even without network failures, systems face a trade-off between fast, eventually consistent reads (lower latency) and slower, strongly consistent reads (higher consistency).

ACID vs. BASE

ACID (Atomicity, Consistency, Isolation, Durability): A set of properties for strict, reliable database transactions. It is essential for systems where data integrity is paramount, such as financial or inventory management systems.
BASE (Basically Available, Soft state, Eventual consistency): An alternative model for large-scale distributed systems that prioritize high availability and rapid response times. BASE systems may exhibit temporary inconsistencies that resolve over time.
Many modern architectures employ a hybrid approach, using ACID for critical transactional flows and BASE for less critical functions like activity feeds or analytics.

Throughput vs. Latency

Throughput: The number of requests a system can process per unit of time (e.g., requests per second).
Latency: The time taken to process a single request from start to finish.
These two metrics are often in opposition; increasing throughput by processing more work in parallel can lead to queue buildup and increased latency for individual requests. Effective system design seeks to balance both for an optimal user experience.

Amdahl’s Law

This law states that the potential performance improvement from parallelization is limited by the portion of the system that must remain sequential.
If a part of a process is inherently non-parallelizable (e.g., a final step that must hit a single master database), that part will become the ultimate bottleneck, capping overall performance regardless of how many more resources are added.

Strong vs. Eventual Consistency

Strong Consistency: Guarantees that all users see the same data immediately following a write operation. It is simpler to reason about but can be slower and less available during failures.
Eventual Consistency: Allows for a brief period where different nodes in a distributed system may have different versions of the data. Updates propagate through the system over time. This model is suited for large-scale applications where immediate consistency is not critical, such as social media timelines.

Stateful vs. Stateless Architecture

Stateful Service: Remembers user-specific context or session data between requests, often storing it locally. This can simplify application logic but complicates scaling, load balancing, and failover.
Stateless Service: Treats every request as new and self-contained, relying on external storage (e.g., databases, caches) for any required state. Stateless services are easier to scale horizontally, as any server instance can handle any request.

Microservices vs. Monoliths

Monolith: A single, unified application where all features are contained within one deployable unit. Monoliths are simpler to develop and deploy initially.
Microservices: An architectural style that splits application features into small, independent services that communicate over a network. This approach allows teams to work independently and scale different components separately but introduces complexity in communication, debugging, and data management.
A common evolutionary path is to start with a monolith and gradually break it apart into microservices as the system grows and its pain points become clear.

Serverless Architecture

Also known as “Functions as a Service” (FaaS), serverless architecture allows developers to run small, event-driven functions in the cloud without managing the underlying server infrastructure.
Advantages: Pay-per-use pricing and automatic scaling handled by the cloud provider. Ideal for workloads with spiky traffic like webhooks, background jobs, or simple APIs.
Trade-offs: Can involve “cold starts” (initial latency), less control over long-running tasks, and potentially higher costs at sustained high volumes.

II. Networking and Communication

This section covers the protocols, patterns, and components used to manage traffic and facilitate communication between different parts of a system.

Load Balancing

Function: Distributes incoming network traffic across multiple servers to prevent any single server from becoming a bottleneck.
Benefits: Improves both system performance and reliability, as the failure of one server does not bring down the entire application.
Implementation: Can be a hardware appliance or a software service. Load balancers typically use health checks to avoid sending traffic to unresponsive servers.

Load Balancing Algorithms

Round Robin: Distributes requests to servers sequentially in a circular order. Simple but does not account for server load or request complexity.
Least Connections: Sends new requests to the server with the fewest active connections. This is effective when requests have varying completion times.
IP Hash: Uses a hash of the client’s IP address to determine which server receives the request. This provides a basic form of “session stickiness,” ensuring a user is consistently routed to the same server.

Reverse Proxy vs. Forward Proxy

Reverse Proxy: Sits in front of a group of servers, intercepting client requests and forwarding them to the appropriate backend server. It can handle tasks like TLS termination, caching, compression, and routing, while hiding the internal network topology.
Forward Proxy: Sits in front of clients, forwarding their requests to the internet. It is often used for security, content filtering, or caching within a corporate or private network.

API Gateway

An API Gateway is a specialized reverse proxy that serves as the single entry point for all API calls in a microservices architecture.
Responsibilities: Handles routing, rate limiting, authentication, logging, and response transformation.
Benefit: Simplifies the client-side by providing a single, unified endpoint.
Risk: Can become a bottleneck or a “mini monolith” if too much business logic is embedded within it.

CDN (Content Delivery Network)

A CDN is a geographically distributed network of proxy servers that cache static assets (images, videos, CSS, JavaScript) close to end-users.
Function: When a user requests content, the request is routed to the nearest CDN node, dramatically reducing latency.
Benefits: Offloads traffic from origin servers, improves front-end performance, and increases application scalability and resilience.

DNS (Domain Name System)

DNS is the system that translates human-readable domain names (e.g., www.example.com) into machine-readable IP addresses (e.g., 192.0.2.1).
It operates with multiple layers of caching for fast lookups and can be used for basic load balancing by returning different IP addresses for the same domain name.

TCP vs. UDP

TCP (Transmission Control Protocol): A connection-oriented protocol that guarantees reliable, ordered, and error-checked delivery of data. It is suitable for applications where data integrity is critical, such as web browsing, file transfers, and APIs.
UDP (User Datagram Protocol): A connectionless protocol that is faster and has less overhead than TCP but does not guarantee delivery or order. It is well-suited for real-time applications like video streaming and online gaming, where speed is more important than perfect reliability.

HTTP/2 and HTTP/3 (QUIC)

HTTP/2: Improved upon HTTP/1.1 by introducing request multiplexing over a single TCP connection, header compression, and server push, all aimed at reducing latency.
HTTP/3: Further enhances performance by running over QUIC (a transport protocol built on UDP), which reduces connection setup time and performs better on unreliable networks with packet loss.

gRPC vs. REST

REST: An architectural style that typically uses HTTP and JSON. It is resource-oriented, human-readable, and widely adopted for public-facing APIs.
gRPC: A high-performance RPC framework that uses HTTP/2 for transport and Protocol Buffers (protobuf) for binary serialization. It is smaller and faster than REST/JSON and supports features like bidirectional streaming, making it a popular choice for internal service-to-service communication in microservices architectures.

WebSocket and Server-Sent Events (SSE)

WebSockets: Provide a persistent, full-duplex (two-way) communication channel between a client and a server over a single TCP connection. Ideal for real-time interactive applications like chat, collaborative editing, and multiplayer games.
SSE: A simpler protocol that allows a server to push updates to a client over a one-way channel using standard HTTP. It is suitable for use cases where only the server needs to send data, such as live news feeds or stock tickers.

Long Polling

A technique that simulates server-push functionality over standard HTTP. The client sends a request to the server, which holds the connection open until it has new data to send or a timeout occurs. Upon receiving a response, the client immediately initiates a new request.
It is less efficient than WebSockets but is easier to implement and compatible with older proxies and firewalls.

Gossip Protocol

A decentralized communication protocol where nodes in a distributed system share information by periodically exchanging data with random peers.
Information propagates through the network “like gossip,” ensuring that all nodes eventually converge on a consistent view without a central coordinator. It is highly fault-tolerant and used for service discovery, health monitoring, and state dissemination in large clusters.

III. Database and Storage Internals

This section details the techniques and technologies used to manage data at scale, focusing on partitioning, replication, indexing, and transactional integrity.

Sharding (Data Partitioning)

Definition: The process of splitting a large database into smaller, more manageable pieces called shards, with each shard residing on a separate machine.
Goal: To scale database storage capacity and throughput horizontally.
Strategies: Include range-based, hash-based, and directory-based sharding.
Challenge: Choosing an effective shard key is crucial to avoid “hot spots,” where one shard receives a disproportionate amount of traffic.

Replication Patterns

Definition: The practice of keeping multiple copies of data on different nodes to improve availability and read performance.
Master-Slave (Primary-Replica): One node (the master) handles all write operations, which are then replicated to one or more slave nodes that can serve read requests.
Master-Master (Multi-Primary): Multiple nodes can accept write operations, and they synchronize data with each other. This increases write availability but introduces complexity in resolving write conflicts.

Consistent Hashing

A hashing technique designed to minimize data re-shuffling when nodes are added to or removed from a distributed system (like a cache or database).
Both keys and nodes are mapped to a logical ring. A key is assigned to the first node encountered moving clockwise on the ring. This ensures that when a node is added or removed, only a small, adjacent set of keys needs to be remapped.

Database Indexing

Purpose: Indexes are data structures that improve the speed of data retrieval operations on a database table at the cost of slower writes and increased storage space.
B-Trees: Balanced tree structures common in relational databases. They keep data sorted and are efficient for both point lookups and range queries.
LSM (Log-Structured Merge) Trees: Optimize for high write throughput by batching writes in memory and periodically flushing them to sorted files on disk. Reads can be more complex as they may need to check multiple files.

Write-Ahead Logging (WAL)

A standard method for ensuring data durability and atomicity. Before any changes are applied to the database itself, they are first recorded in a sequential log file on durable storage.
In the event of a system crash, the database can replay the log to recover to a consistent state, preventing data corruption from partially completed transactions.

Normalization vs. Denormalization

Normalization: The process of organizing data in a relational database to minimize redundancy and improve data integrity by dividing larger tables into smaller, well-structured ones.
Denormalization: The intentional introduction of redundancy by duplicating data across multiple tables. This is often done in high-scale systems to optimize read performance by avoiding expensive join operations.

Polyglot Persistence

The practice of using multiple different database technologies within a single application, choosing the best tool for each specific job.
An application might use a relational database for transactional data, a document store for unstructured content, a key-value store for caching, and a graph database for relationship-heavy data. This adds operational complexity but allows for optimized performance and functionality.

Bloom Filters

A probabilistic, space-efficient data structure used to test whether an element is a member of a set.
It can produce false positives (it might incorrectly say an element is in the set) but never false negatives (if it says an element is not in the set, it is definitively not).
They are used to avoid expensive lookups for items that are likely not present, such as checking a cache before querying a database.

Vector Databases

Specialized databases designed to store, manage, and query high-dimensional vector embeddings, which are numerical representations of data like text or images.
They excel at similarity searches using distance metrics (e.g., cosine similarity), enabling applications like semantic search, recommendation engines, and other AI-powered features.

IV. Reliability and Fault Tolerance

This section explores patterns and strategies for building resilient systems that can withstand and recover from failures.

Rate Limiting

Function: Controls the frequency of requests a user or client can make to an API or service within a specific time window.
Purpose: Protects backend services from abuse, accidental overload, and denial-of-service attacks.
Strategies: Common algorithms include fixed window, sliding window, and token bucket.

Circuit Breaker Pattern

A pattern that prevents an application from repeatedly trying to execute an operation that is likely to fail.
Mechanism: A circuit breaker monitors calls to a downstream service. If the number of failures exceeds a threshold, the breaker “opens,” and subsequent calls fail immediately without attempting to contact the service. After a timeout, the breaker enters a “half-open” state to test if the service has recovered.

Bulkhead Pattern

An application design pattern that isolates system elements into pools so that if one fails, the others can continue to function.
Named after the partitioned sections of a ship’s hull, this pattern can be implemented by using separate thread pools or connection pools for different services, preventing a failure in one area from cascading and taking down the entire system.

Retry Patterns and Exponential Backoff

Retries: A mechanism for handling transient failures by automatically re-attempting a failed operation.
Exponential Backoff: A crucial enhancement to retries where the delay between attempts increases exponentially (e.g., 1s, 2s, 4s). This prevents a client from overwhelming a struggling service with rapid-fire retries. Adding “jitter” (a small random delay) is also recommended to avoid synchronized retry storms.

Idempotency

An operation is idempotent if it can be performed multiple times with the same result as performing it once. For example, setting a value is idempotent, while incrementing a counter is not.
Idempotency is critical in distributed systems where network failures can lead to retries, ensuring that a re-sent request does not cause unintended side effects like duplicate transactions.

Heartbeat

A periodic signal sent from a node or service to a monitoring system to indicate it is alive and functioning correctly.
If the monitoring system stops receiving heartbeats from a node, it can assume the node has failed and trigger a failover process.

Leader Election

The process in a distributed system by which a single node is chosen to assume a special role, such as a coordinator or primary for writes.
Consensus algorithms like Paxos and Raft provide fault-tolerant mechanisms to ensure that all nodes agree on a single leader and can elect a new one if the current leader fails.

Distributed Transactions (SAGA Pattern)

The SAGA pattern is a way to manage data consistency across multiple microservices without using traditional two-phase commit locks.
A transaction is structured as a sequence of local transactions, each with a corresponding compensating action. If any step fails, the compensating actions are executed in reverse order to undo the preceding steps, thus maintaining overall consistency.

Two-Phase Commit (2PC)

A protocol used to achieve atomic transactions across multiple distributed nodes.
Phase 1 (Prepare): A coordinator asks all participating nodes if they are ready to commit.
Phase 2 (Commit/Abort): If all participants vote “yes,” the coordinator instructs them to commit. If any vote “no” or fail to respond, the coordinator instructs all to roll back.
2PC provides strong consistency but is prone to blocking if the coordinator fails and can be a performance bottleneck.

V. Caching and Messaging

This section describes key technologies for improving performance and decoupling system components through in-memory data storage and asynchronous communication.

Caching

Definition: Storing copies of frequently accessed data in a fast, temporary storage layer (typically memory) to serve future requests more quickly.
Benefits: Reduces latency for end-users and decreases the load on backend systems like databases.
Challenge: The primary difficulty with caching is “cache invalidation” — ensuring that stale data is removed or updated when the source data changes.

Caching Strategies

Cache-Aside: The application is responsible for managing the cache. It first checks the cache; on a miss, it reads data from the database, then writes that data into the cache for future requests.
Write-Through: The application writes data to the cache and the database simultaneously. This ensures the cache is always consistent with the database but adds latency to write operations.
Write-Back: The application writes data only to the cache, which acknowledges the write immediately. The data is then flushed to the database asynchronously at a later time. This offers very low write latency but risks data loss if the cache fails before the data is persisted.

Cache Eviction Policies

LRU (Least Recently Used): When the cache is full, the item that has been accessed least recently is removed.
LFU (Least Frequently Used): When the cache is full, the item that has been accessed the fewest times is removed.
Other policies include FIFO (First-In, First-Out) and random replacement. The choice of policy depends on the application’s access patterns.

Message Queues (Point-to-Point)

A message queue enables asynchronous communication between services. A “producer” sends a message to a queue, and a “consumer” retrieves it for processing at a later time.
Each message is typically processed by only one consumer. This pattern decouples the sender and receiver, allowing them to operate and scale independently. It is commonly used for background jobs.

Pub/Sub (Publish-Subscribe)

A messaging pattern where “publishers” send messages to a “topic” without knowledge of the “subscribers.” Any number of subscribers can listen to a topic and receive a copy of every message sent to it.
This enables one-to-many, broadcast-style communication and is central to event-driven architectures.

Dead Letter Queues (DLQ)

A secondary queue used to store messages that could not be processed successfully after a certain number of retries.
Moving “poison messages” to a DLQ prevents them from blocking the main processing queue. Engineers can later inspect the DLQ to diagnose and resolve the underlying issues.

VI. Observability and Security

This section covers essential concepts for monitoring system health, understanding behavior, and implementing robust security measures.

Distributed Tracing

A method for monitoring and profiling applications, especially those built using a microservices architecture.
It tracks a single request as it travels through multiple services, assigning a unique trace ID that allows developers to visualize the entire request path, identify bottlenecks, and debug cross-service issues.

SLA vs. SLO vs. SLI

SLA (Service Level Agreement): A formal contract with a customer that defines the level of service they can expect, often with financial penalties for non-compliance (e.g., “99.9% uptime”).
SLO (Service Level Objective): An internal target for system reliability that is stricter than the SLA. This is the goal that engineering teams strive to meet.
SLI (Service Level Indicator): The actual, quantitative metric used to measure compliance with an SLO (e.g., the success rate of HTTP requests). The SLI is the “scoreboard” that measures performance.

OAuth 2.0 and OIDC

OAuth 2.0: An authorization framework that allows a user to grant a third-party application limited access to their resources on another service without sharing their credentials.
OIDC (OpenID Connect): A thin layer built on top of OAuth 2.0 that adds an authentication component. It allows an application to verify a user’s identity and obtain basic profile information. Together, they form the foundation of modern “Login with…” features.

TLS/SSL Handshake

TLS (Transport Layer Security)/SSL (Secure Sockets Layer): Cryptographic protocols that provide secure communication over a computer network.
The handshake is the initial process where the client and server establish a secure connection. During the handshake, they agree on an encryption cipher, exchange cryptographic keys, and authenticate the server via its digital certificate.

Zero Trust Security

A security model based on the principle of “never trust, always verify.” It assumes that threats can originate from anywhere, both inside and outside the network perimeter.
In a Zero Trust architecture, every request must be authenticated, authorized, and encrypted, regardless of its origin. Access is granted based on user identity and device posture, not on network location.

You can think of system design like running a professional restaurant. Vertical scaling is buying a bigger stove, while horizontal scaling is hiring a whole team of chefs. Load balancing is the host at the front door assigning customers to different tables so no waiter is overwhelmed. A CDN is like having pre-made snacks available at local convenience stores so people don’t have to travel to your main kitchen for everything. Finally, Circuit Breakers are like a safety fuse in the kitchen: if one appliance starts smoking, it cuts the power immediately to that section so the whole restaurant doesn’t burn down.

Fonte: https://medium.com/@MaheshwariRishabh/50-core-system-design-concepts-6828ed73c2e8

Top 50 System Design

March 4, 2026

Luis Fernando Chaim

Arquitetura No Comments

System design interview performance is always a critical factor in validating whether a candidate can come up with scalable and efficient systems. Knowledge of major terminologies will definitely help in acing these. Below are the top 50 must-know system design interview terminologies that we will explain with definitions and working examples, along with additional resources for learning.

1. Scalability

Definition: It is the ability of a system to support increased load by adding resources.
Example: Addition of more servers to handle the increase in web traffic.
Learn More: What is Scalability and How to Achieve it?

2. Load Balancer

Definition: Dividing the incoming network traffic among multiple servers so that no one server processes a large amount of load.
Example: Load balancing web traffic across multiple EC2 instances using the AWS Elastic Load Balancer(ELB) Service.
Learn More: Understanding Load Balancer

3. Microservices

Definition: It is an architectural pattern forcing the structuring of an application as a collection of loosely coupled services.
Example: Breaking down a monolithic application into independent services responsible for user management, processing payments, and sending notifications.
Learn More: What are Microservices?

4. CAP Theorem

Definition: It states that at best, only two out of three guarantees can be gained in a distributed system: Consistency, Availability, and Partition Tolerance.
Example: When to Trade Off Consistency for Availability — And Vice Versa — in Distributed Database Design.
Learn More: Understanding CAP Theorem

5. Sharding

Definition: It involves breaking down a large database into smaller pieces called shards for better management.
Example: Sharding a user database based on geographic region.
Learn More: Database Sharding Explained

6. Latency

Definition: This gets defined as the time that it takes for data to travel from point A to point B.
Example: Measuring the delay involved in message delivery through a chat application.
Learn More: Latency explained!

7. Throughput

Definition: A measure of the quantity of data a system processes in some timeframe
Example: Requests processed by a web server in one second.
Learn More: Throughput in Computer Networks

8. Cache

Definition: Any hardware or software component that stores data to obviate future requests for the same data, serving It quickly.
Example: Implementing Redis caching for repeated database queries.
Learn More: Caching Explained

9. Content Delivery Network (CDN)

Definition: A server system, geographically dispersed, that shows Web content to a user based on the geographical location from which he is accessing.
Example: Using Cloudflare CDN for faster web page loading.
Learn More: What is a CDN?

10. REST API

Definition: a type of architectural style designed to build web services where data is accessed and manipulated using HTTP requests.
Example: Designing the Social Media API by REST(Representational State Transfer) principles.
Learn More: REST API Tutorial

11. GraphQL

Definition: It is a language designed to query data, so it is much more powerful, efficient, and flexible than REST.
Example: Using GraphQL to query user information in a single request.
Learn More: GraphQL Introduction

12. ACID

Definition: A set of properties ensuring reliable processing of database transactions. The properties are Atomicity, Consistency, Isolation, and Durability.
Example: Ensuring that a banking transaction has ACID properties prevents corrupted data.
Learn More: ACID Properties in Databases

13. BASE

Definition: An alternate to ACID that emphasizes Availability and Partition tolerance over strict-Consistency. Basically Available, Soft state, Eventually consistent system.
Example: Design of a highly available, eventually consistent NoSQL database.
Learn More: BASE vs ACID

14. NoSQL

Definition: A type of database designed to promote storage and retrieval of data modelled in ways other than the tabular relationships used in relational databases.
Example: Using MongoDB for a document-based data store.
Learn More: What is a NoSQL Database?

15. SQL

Definition: It is the standard language used for storing, manipulating, and retrieving data in relational databases.
Example: Writing SQL queries to get data back from a relational database.
Learn More: SQL Tutorial

16. Database Indexing

Definition: It is a data structure technique that allows quick searching and access to data from a database.
Example: Create indexing on the column of User ID for searching speed enhancement.
Learn More: Database Indexing

17. Replication

Definition: A process of copying and maintaining database objects in a multitude of databases which make up a distributed database system.
Example: It involves allowing a database to be highly available across different geographical locations using replication.
Learn More: Database Replication

18. Failover

Definition: A backup operational mode in which system component functions are taken over by other system components in case of loss of a primary system component.
Example: Built-in automatic failovers to standby servers in the event of a server failure of your internet applications.
Learn More: Failover Vs Disaster Recovery

19. API Gateway

Definition: A server that sits at the front of an API, receiving API requests, applying throttling and security policies, and then forwarding them to back-end services.
Example: Using AWS API Gateway to manage APIs.
Learn More: What is an API Gateway?

20. Service Mesh

Definition: A dedicated infrastructure layer for facilitating service-to-service communications between microservices.
Example: Integrating Istio as a service mesh for the management of microservice interactions.
Learn More: Introduction to Service Mesh

21. Serverless Computing

Definition: A Cloud computing implementation that “dynamically allows for the allotment of machine resources by the cloud provider”.
Example: Run backend code without any server provisioning at your end using AWS Lambda.
Learn More: What is Serverless Computing?

22. Event-Driven Architecture

Definition: A software architecture paradigm encouraging the generation, detection, and consumption of, and the reaction to, events in general.
Example: Design a system with event communications between microservices using Apache Kafka.
Learn More: Event-Driven Architecture

23. Monolithic Architecture

Definition: A software architecture wherein all the elements are fitted into a single application and run as a single service.
Example: Old traditional enterprise applications built as a single, large unit.
Learn More: Monolithic vs Microservices Architecture

24. Distributed Systems

Definition: A model wherein components located on networked computers communicate with each other and coordinate their actions by passing messages.
Example: Designing a distributed file system like Hadoop.
Learn More: Introduction to Distributed Systems

25. Message Queue

Definition: This method allows asynchronous, service-to-service communication in both serverless and microservices architectures.
Example: Using RabbitMQ to queue messages between services.
Learn More: Message Queues Explained

26. Pub/Sub Model

Definition: A messaging pattern in which senders (publishers) publish messages so abstractly that any one of them can end up being accessed by recipients without the sender having to even know the identity of the destination receivers (subscribers).
Example: A notification system that uses Google Cloud Pub/Sub.
Learn More: Pub/Sub Messaging

27. Data Partitioning

Definition: Division of a database into smaller, manageable parts.
Example: Partitioning a table in a database by date to allow super-fast query execution.
Learn More: Database Partitioning

28. Horizontal Scaling

Definition: Increasing the capacity by adding more machines or nodes within a system.
Example: Adding more web servers to handle an increasing volume of user traffic.
Learn More: Horizontal vs Vertical Scaling

29. Vertical Scaling

Definition: Upgrading an already existing machine with more power in the form of a CPU or RAM.
Example: Upgrading the RAM of a server so that it can handle more requests all at once.
Learn More: Horizontal vs Vertical Scaling

30. Rate Limiting

Definition: It means controlling the rate of traffic that the network interface controller is sending or receiving.
Example: Throttling an API to prevent abusive behaviour.
Learn More: Understanding Rate Limiting

31. Circuit Breaker Pattern

Definition: A design pattern used in modern software development, applied to detect failures and encapsulate the logic of preventing a failure from constantly recurring.
Example: Handling failed remote service calls using a circuit breaker in a microservice architecture.
Learn More: Circuit Breaker Pattern

32. Data Consistency

Definition: Ensuring that data is the same across multiple instances and is not corrupted.
Example: Maintaining the consistency of user data through multiple replicas of a database.
Learn More: Data Consistency Models

33. Eventual Consistency

Definition: A model of consistency used in distributed computing toward the goal of high availability, stating that updates to a system will eventually propagate and be reflected by all nodes.
Example: Amazon DynamoDB provides an eventually consistent model for the read operation.
Learn More: Eventual Consistency

34. Strong Consistency

Definition: A consistency model ensuring every read gets the most recent write on a given unit of data.
Example: Using strong consistency in a financial transaction system.
Learn More: Strong Consistency

35. Containerization

Definition: Basically, this is whenever an application and its dependencies are encapsulated into a container to be run on any computational environment.
Example: Using Docker to containerize the applications for deployment in various environments such as dev, test, prod etc.
Learn More: What is Containerization?

36. Kubernetes

Definition: An open-source platform that automates the process of application container deployment, scaling, and operation.
Example: Run and deploy containerized applications using Kubernetes.
Learn More: Kubernetes Documentation

37. Autoscaling

Definition: Automatically adjusting the number of computational resources based on the user load.
Example: Utilizing AWS EC2 Auto Scaling feature to dynamically adjust the number of instances.
Learn More: Auto Scaling explained

38. Multi-Tenancy

Definition: Architecture where a single instance of a software application serves multiple consumers/customers.
Example: SaaS applications, such as Salesforce, utilize multi-tenancy in their service provision toward their different categories of customers.
Learn More: Single Tenancy Vs Multi-Tenancy?

39. Load Shedding

Definition: Backing off some demands or degrading services to maintain the health of the overall system under high load.
Example: This will turn off all non-essential services during times of peak traffic.
Learn More: Load Shedding

40. Idempotence

Definition: A property for some mathematical and computer-science operations stating that it has the same effect if repeated more times than once.
Example: An HTTP DELETE request is idempotent.
Learn More: Idempotence in APIs

41. Quorum

Definition: The minimum number of votes needed to commit a distributed transaction.
Example: Basically, quorum-based replication ensures that consistency exists in the distributed database.
Learn More: Quorum Systems

42. Orchestration

Definition: A pattern of service interaction where a central coordinator controls the interaction between services.
Example: Using a workflow engine to manage some multi-step business process.
Learn More: Orchestration

43. Choreography

Definition: A service interaction pattern in which every service is self-contained and interacts with others through events; there will not be any coordinator or orchestrator.
Example: Microservices communicating through an event bus in a choreography pattern.
Learn More: Choreography vs. Orchestration

44. Service Registry

Definition: A database that keeps track of instances of microservices.
Example: Using the Eureka service registry in a microservice architecture.
Learn More: Service Registry and Discovery

45. API Rate Limiting

Definition: It means controlling how many requests a client can make against an API within a certain timeframe.
Example: Limiting requests to an API to 100 per minute to prevent abuse.
Learn More: API Rate Limiting

46. Data Warehouse

Definition: A system that helps in the generation of reports and business data analytics; the hub of Business Intelligence.
Example: Amazon Redshift can be implemented in data warehousing.
Learn More: Understanding Data Warehouse?

47. Data Lake

Definition: A system or repository where data is kept in native/raw format, generally as object blobs or files.
Example: Petabyte scaling for storing and managing structured and unstructured data in a data lake.
Learn More: Data Lake

48. OLAP

Definition: Online Analytical Processing : The software category that allows the analysis of data kept in a database.
Example: Use of the OLAP cubes for pointy analytical and arbitrary queries.
Learn More: OLAP Explained

49. OLTP

Definition: Online Transaction Processing: a class of systems that manage transaction-oriented applications.
Example: Using OLTP systems for transaction data management, as in banking systems etc.
Learn More: OLTP Explained

50. Big Data

Definition: Large, complex data sets that cannot be efficiently managed by conventional data-processing software in the best of cases.
Example: Analyzing social media interactions to predict fashion trends.
Learn More: Introduction to Big Data

Keep in mind that it’s all about continuous learning and practice as you go further in system design. You can work with the resources, get involved in the discussions, and practice these concepts in your projects. The resources and discussions will expose you to the vocabulary and usages of the concept.

Fonte: https://interviewnoodle.com/top-50-system-design-terminologies-you-must-know-3c78f5fb99c1

50 System Design

February 27, 2026

Luis Fernando Chaim

Arquitetura No Comments

I. Princípios Essenciais de Arquitetura

Escala vertical versus horizontal

Escalabilidade vertical significa atualizar uma única máquina, como adicionar mais CPU, RAM ou armazenamento mais rápido.
A escalabilidade horizontal significa adicionar mais máquinas e distribuir o trabalho entre elas.

A instalação vertical é mais fácil, mas atinge os limites do hardware e se torna cara.

A arquitetura horizontal é mais complexa porque exige balanceamento de carga, serviços sem estado e armazenamento compartilhado.

Pense da seguinte forma: na vertical, um super-herói fica mais forte; na horizontal, a formação de uma equipe.

Teorema CAP

O Teorema CAP afirma que, na presença de uma partição de rede, um sistema distribuído deve escolher entre Consistência e Disponibilidade . Consistência significa que todos os usuários veem os mesmos dados ao mesmo tempo.

Disponibilidade significa que o sistema sempre responde, mesmo que os dados estejam ligeiramente desatualizados.

Não é possível ter consistência perfeita e disponibilidade perfeita quando sua rede está inoperante, então você decide qual delas sacrificar para o seu caso de uso.

Teorema PACELC

O PACELC amplia o CAP e afirma: se houver uma partição, escolha Disponibilidade ou Consistência; caso contrário, escolha Latência ou Consistência.

Mesmo quando a rede está funcionando bem, ainda existe o dilema entre leituras lentas, porém consistentes, e leituras rápidas, porém eventualmente consistentes. Sistemas que sincronizam entre regiões geralmente sacrificam a latência para manter uma alta consistência.

Isso explica por que alguns bancos de dados são rápidos, mas ligeiramente desatualizados, enquanto outros são mais lentos, mas sempre precisos.

ÁCIDO vs BASE

ACID preza por transações rigorosas e confiáveis: Atomicidade, Consistência, Isolamento e Durabilidade. É adequado para sistemas financeiros, controle de estoque e qualquer área onde erros podem ser muito custosos.

BASE significa “Basicamente Disponível”, “Estado Flexível” e “Consistência Eventual”, sendo utilizado em grandes sistemas distribuídos que precisam permanecer operacionais e responder rapidamente.

Os sistemas BASE podem apresentar inconsistências temporárias, mas se corrigem com o tempo.

Na prática, muitas arquiteturas combinam ambos, usando ACID para os fluxos financeiros principais e BASE para coisas como feeds e análises.

Taxa de transferência versus latência

A taxa de transferência (throughput) é a quantidade de solicitações que seu sistema consegue processar por segundo.
Latência é o tempo que uma única solicitação leva do início ao fim.

Muitas vezes, é possível aumentar a taxa de transferência realizando mais tarefas em paralelo, mas isso pode aumentar a latência caso as filas se acumulem.

Imagine um restaurante que recebe muitos pedidos simultaneamente, mas faz os clientes esperarem bastante. Um bom projeto de sistema busca o equilíbrio entre esses dois aspectos: capacidade suficiente para lidar com picos de demanda, mas baixa latência para uma experiência de usuário fluida.

Lei de Amdahl

A Lei de Amdahl afirma que o ganho de velocidade resultante da paralelização é limitado pela parte que não pode ser paralelizada.

Se 20% do seu sistema é sempre sequencial, nenhuma quantidade de máquinas adicionais resolverá esse gargalo.

Deixa eu explicar melhor.

Se sua solicitação sempre precisar acessar um único banco de dados mestre, esse mestre limitará seu desempenho. Essa lei nos lembra de buscar gargalos em vez de simplesmente adicionar mais servidores.

Consistência forte versus consistência eventual

A consistência forte significa que todos os usuários veem os mesmos dados imediatamente após uma gravação.
A consistência eventual significa que as atualizações são distribuídas ao longo do tempo e os nós podem divergir brevemente.

A consistência forte é mais fácil de analisar, mas geralmente é mais lenta e menos eficaz em situações de falha.

A consistência eventual é ótima para sistemas de grande escala, como cronogramas ou contadores, onde a frescura perfeita não é essencial.

O importante é escolher o modelo que melhor se adapte à experiência de usuário que você precisa.

Arquitetura com estado versus arquitetura sem estado

Um serviço com estado memoriza o contexto do usuário entre as requisições, geralmente armazenando os dados da sessão localmente.
Um serviço sem estado trata cada solicitação como nova, dependendo de armazenamentos externos, como caches ou bancos de dados, para qualquer estado.

Serviços sem estado são mais fáceis de escalar horizontalmente porque qualquer instância pode lidar com qualquer solicitação.

Sistemas com estado podem ser mais simples de programar, mas mais difíceis de balancear a carga e de realizar failover.

Nos sistemas de nuvem modernos, tentamos transferir o estado para os bancos de dados e manter os serviços o mais livres de estado possível.

Microsserviços vs. Monolitos

Um monolito é uma aplicação única que contém muitas funcionalidades em uma única unidade implantável.

Os microsserviços dividem as funcionalidades em serviços separados que se comunicam pela rede.

Os microsserviços ajudam as equipes a trabalhar de forma independente e a dimensionar diferentes partes separadamente, mas introduzem complexidade em torno da comunicação, depuração e consistência de dados.

Os monolitos são mais simples para começar e geralmente funcionam bem até uma certa escala. Aqui está a parte complicada.

Muitos sistemas excelentes começam como monolitos e evoluem gradualmente para microsserviços quando os problemas se tornam reais.

Arquitetura sem servidor

A computação sem servidor permite executar pequenas funções na nuvem sem precisar gerenciar servidores diretamente. Você paga somente quando seu código é executado, e a plataforma cuida do escalonamento e da infraestrutura para você.

É ideal para cargas de trabalho orientadas a eventos, como webhooks, tarefas em segundo plano ou APIs leves com picos de tráfego.

A desvantagem é um menor controle sobre tarefas de longa duração, inicializações a frio e, às vezes, um custo mais elevado em volumes muito altos.

Pense em computação sem servidor como “funções como serviço”, perfeita para código de integração e serviços leves.

II. Redes e Comunicação

Balanceamento de carga

O balanceamento de carga distribui o tráfego de entrada entre vários servidores, evitando a sobrecarga de um único servidor. Isso melhora tanto a confiabilidade quanto o desempenho, já que a falha de um único servidor não derruba todo o sistema.

Os balanceadores de carga podem ser dispositivos de hardware ou serviços de software. Eles geralmente oferecem suporte a verificações de integridade para interromper o envio de tráfego para instâncias com problemas.

Do ponto de vista de uma entrevista, elas são o primeiro passo para uma ascensão horizontal na carreira.

Algoritmos de balanceamento de carga

Os algoritmos comuns de balanceamento de carga incluem Round Robin, Menor Número de Conexões e Hash de IP.

O Round Robin alterna entre os servidores em ordem e é simples de implementar.
A opção “Menor número de conexões” direciona o tráfego para o servidor com o menor número de conexões ativas, o que é útil quando as solicitações variam em duração.
O IP Hash utiliza um hash do endereço IP do cliente, de forma que o mesmo usuário geralmente se conecta ao mesmo servidor, o que facilita a manutenção da sessão.

A escolha do algoritmo correto afeta a imparcialidade, o uso de recursos e a experiência do usuário.

Proxy reverso vs. proxy direto

Um proxy reverso fica na frente dos servidores e os representa para os clientes. Ele oculta a topologia interna, pode realizar terminação TLS, cache, compressão e roteamento.
Um proxy reverso fica à frente dos clientes e os representa para o mundo exterior, geralmente para fins de segurança, armazenamento em cache ou filtragem de conteúdo.

Imagine um proxy reverso como a recepção de uma empresa que esconde todas as salas internas, e um proxy direto como um portal que seu laptop precisa atravessar para acessar a internet.

Entender a diferença ajuda quando se fala de gateways de API e proxies corporativos.

Gateway de API

Um gateway de API é um proxy reverso especial que atua como um ponto de entrada único para todas as chamadas de API em um sistema de microsserviços. Ele gerencia o roteamento para o serviço correto, a limitação de taxa, a autenticação, o registro de logs e, às vezes, a modelagem de respostas.

Isso reduz a complexidade no lado do cliente, já que os clientes se comunicam apenas com um único ponto de extremidade.

Se você sobrecarregar o gateway com muita lógica, ele pode se tornar um gargalo ou um pequeno monolito por si só. Bons projetos o mantêm focado e enxuto.

CDN (Rede de Distribuição de Conteúdo)

Uma CDN é uma rede de servidores geograficamente distribuídos que armazenam em cache conteúdo estático, como imagens, vídeos e scripts, mais perto dos usuários.

Quando um usuário solicita conteúdo, ele é encaminhado para o nó da CDN mais próximo, o que reduz significativamente a latência. Isso também alivia o tráfego dos seus servidores de origem, melhorando a escalabilidade e a resiliência.

As CDNs são essenciais para aplicações globais e para o desempenho da interface.

Considere-os como “cópias locais” dos arquivos pesados do seu site, espalhados pelo mundo.

DNS (Sistema de Nomes de Domínio)

O DNS mapeia nomes de domínio legíveis para humanos em endereços IP.

Ao digitar o nome de um site, seu dispositivo consulta o DNS para encontrar o endereço numérico do servidor.

Possui múltiplas camadas de cache, portanto as respostas são rápidas após a primeira consulta. Também pode ser usado para realizar balanceamento de carga simples, retornando IPs diferentes para o mesmo nome.

Compreender o DNS ajuda a entender por que as alterações de nomes demoram a se propagar e por que algumas interrupções são causadas por configurações incorretas de DNS.

TCP vs UDP

O TCP é um protocolo confiável e orientado a conexão. Ele garante a entrega ordenada e com verificação de erros por meio de confirmações e novas tentativas.
O UDP não requer conexão e não garante a entrega ou a ordem dos dados, o que o torna muito mais rápido e leve.

O TCP é adequado para APIs, páginas web e transferências de arquivos, onde a precisão é fundamental.

O UDP funciona bem para aplicações em tempo real, como videochamadas ou jogos, onde a perda ocasional de pacotes é aceitável.

Pense no TCP como uma carta registrada e no UDP como cartões postais rápidos.

HTTP/2 e HTTP/3 (QUIC)

O HTTP/2 introduziu a multiplexação, que permite que várias requisições compartilhem uma única conexão TCP, reduzindo a sobrecarga. Ele também trouxe recursos como compressão de cabeçalho e push do servidor.
O HTTP/3 funciona sobre o QUIC, que é baseado no UDP e melhora o tempo de estabelecimento da conexão e o desempenho em redes instáveis. Essas versões visam principalmente reduzir a latência e aproveitar melhor as condições de rede modernas.

Para você, como engenheiro, a ideia principal é: menos configurações de conexão e melhor aproveitamento de uma única conexão.

gRPC vs REST

REST normalmente usa HTTP com JSON e se concentra em recursos como /usersURLs ou URLs /orders. É simples, legível para humanos e amplamente utilizado para APIs públicas.
O gRPC utiliza HTTP/2 e mensagens codificadas em binário (protobuf), que são menores e mais rápidas na transmissão pela rede. Ele também suporta streaming bidirecional e tipagem forte.

Em microsserviços, o gRPC é frequentemente preferido para chamadas de serviço para serviço, enquanto o REST é comum para clientes externos.

Use REST quando a legibilidade e a compatibilidade forem importantes, e gRPC quando o desempenho e os contratos forem importantes.

WebSocket e Eventos Enviados pelo Servidor (SSE)

Os WebSockets criam uma conexão full-duplex, onde o cliente e o servidor podem enviar mensagens um para o outro a qualquer momento.

O SSE permite que o servidor envie eventos para o cliente por meio de um canal unidirecional usando HTTP.

WebSockets são ótimos para bate-papos, jogos multiplayer e colaboração ao vivo.

O SSE é mais simples e adequado para casos como atualizações de placar ao vivo ou notificações, onde apenas o servidor precisa enviar as atualizações.

Ambos resolvem problemas de comunicação em tempo real que o HTTP puro não consegue lidar bem.

Votação de longo prazo

O long polling é uma técnica em que o cliente envia uma solicitação e o servidor a mantém aberta até que haja novos dados ou um tempo limite seja atingido.

Quando a resposta é recebida, o cliente abre imediatamente outra solicitação. Isso simula atualizações em tempo real via HTTP puro, sem protocolos especiais.

É menos eficiente que o WebSocket, mas mais fácil de implementar e funciona através da maioria dos proxies e firewalls.

Pense nisso como perguntar “alguma novidade?” e esperar em silêncio até que haja uma resposta.

Protocolo de Fofoca

Um protocolo de fofoca permite que os nós em um sistema distribuído compartilhem informações conversando periodicamente com outros nós aleatórios.

Com o tempo, a informação se espalha como fofoca em um grupo social até que todos tenham uma visão semelhante. Ela é usada para compartilhar informações sobre membros, estado de saúde ou configurações de forma tolerante a falhas.

O protocolo é eventualmente consistente e não depende de uma autoridade central. Isso o torna ideal para grandes clusters onde os nós entram e saem com frequência.

III. Componentes internos do banco de dados e do armazenamento

Fragmentação (Particionamento de Dados)

O particionamento (sharding) divide os dados entre várias máquinas, cada uma contendo um subconjunto dos dados. As estratégias comuns incluem particionamento baseado em intervalo, particionamento baseado em hash e particionamento baseado em diretório.

O principal objetivo é escalar o armazenamento e a taxa de transferência, evitando um único nó de banco de dados gigante.

A parte complicada é escolher uma chave de fragmentação que evite pontos de acesso intenso, onde um fragmento concentra a maior parte do tráfego. Depois de fragmentar o sistema, mover dados entre fragmentos (refragmentação) torna-se um desafio operacional importante.

Padrões de replicação (mestre-escravo, mestre-mestre)

Replicação significa manter múltiplas cópias de dados em nós diferentes.

Em um modelo mestre-escravo (ou réplica primária), um nó lida com as escritas e replica as alterações para os outros nós que realizam as leituras.
Em um modelo mestre-mestre (multi-primário), vários nós aceitam gravações e resolvem conflitos.

A replicação melhora o desempenho de leitura e a disponibilidade, mas dificulta a consistência, especialmente quando as gravações são feitas em vários nós.

Em entrevistas, espere que falemos sobre como o atraso na replicação afeta as leituras e como funciona o failover quando um servidor mestre falha.

Hashing consistente

O hash consistente é uma técnica para distribuir chaves entre nós de forma a minimizar a movimentação de dados quando nós são adicionados ou removidos.

As chaves e os nós são colocados em um anel lógico, e cada chave pertence ao nó seguinte no anel.

Quando um nó entra ou sai, apenas uma pequena parte das chaves precisa ser movida. Essa propriedade é muito útil em caches e bancos de dados distribuídos.

Imagine um mapeamento suave que não se desorganiza quando o tamanho do cluster muda.

Indexação de banco de dados (árvores B, árvores LSM)

Os índices aceleram as consultas ao organizar os dados de forma a permitir uma pesquisa rápida.

Árvores B são árvores balanceadas que mantêm os dados ordenados e permitem encontrar intervalos de forma eficiente, sendo comuns em bancos de dados relacionais.

As árvores LSM processam gravações em lote na memória e periodicamente as transferem para o disco, o que torna as gravações muito rápidas, mas as leituras mais complexas.

A questão é a compensação entre cargas de trabalho com muita escrita e cargas de trabalho com muita leitura.

A ideia principal é que os índices são uma estrutura separada que precisa ser atualizada a cada gravação, e é por isso que muitos índices prejudicam o desempenho de inserção.

Registro antecipado de escrita (WAL)

O Write Ahead Logging registra as alterações em um log antes de aplicá-las ao banco de dados principal.

Se ocorrer uma falha no meio de uma transação, o sistema pode reproduzir o log para restaurar um estado consistente. O WAL garante a durabilidade e a atomicidade das transações. Ele também permite técnicas como a replicação a partir do fluxo de logs. Deixe-me explicar por que isso é importante.

Sem o WAL, uma falha no sistema pode deixar seus dados corrompidos ou parcialmente desatualizados.

Normalização vs Desnormalização

A normalização organiza os dados em tabelas que reduzem a redundância e as dependências, seguindo regras como a primeira forma normal, a segunda forma normal e assim por diante. Isso evita anomalias em atualizações e inserções.
A desnormalização duplica intencionalmente os dados para acelerar as leituras e reduzir as junções. Em sistemas de grande escala, a desnormalização é comum em caminhos com grande volume de leituras, como armazenar nomes de usuários junto com as postagens em vez de realizar junções a cada vez.

A verdadeira habilidade reside em saber onde se pode desnormalizar com segurança sem quebrar a consistência.

Persistência Poliglota

Persistência poliglota significa usar vários tipos de bancos de dados dentro do mesmo sistema, cada um escolhido por sua principal função. Você pode usar um banco de dados relacional para transações, um banco de dados de documentos para registros, um banco de dados chave-valor para cache e um banco de dados de grafos para relacionamentos.

Em vez de forçar tudo em um único banco de dados, você escolhe a ferramenta certa para cada tarefa.

A contrapartida é uma maior complexidade operacional e maior necessidade de conhecimento por parte da equipe.

Filtros de brilho

Um filtro de Bloom é uma estrutura de dados que utiliza pouco espaço e responde rapidamente à pergunta “este item pode estar no conjunto?”, com possíveis falsos positivos, mas sem falsos negativos. Ele usa múltiplas funções de hash para definir bits em um vetor de bits quando itens são inseridos.

Para verificar a presença do item, você testa os mesmos bits; se algum bit for zero, o item definitivamente não está presente.

Bancos de dados e caches usam filtros de Bloom para evitar buscas desnecessárias em disco ou falhas de cache.

Pense neles como filtros rápidos que dizem “definitivamente não” ou “talvez”.

Bancos de dados vetoriais

Bancos de dados vetoriais armazenam e consultam vetores, que são representações numéricas de dados como texto, imagens ou áudio. Esses vetores provêm de modelos como embeddings e permitem buscas por similaridade, como “encontrar documentos mais semelhantes a este”.

Em vez de comparações exatas de igualdade, eles usam métricas de distância como similaridade de cosseno ou distância euclidiana. Isso é essencial para sistemas modernos de busca, recomendação e assistentes de IA.

Em entrevistas, basta saber que os bancos de dados vetoriais suportam a busca por vizinhos mais próximos em dados de alta dimensionalidade.

IV. Confiabilidade e Tolerância a Falhas

Limitação de taxa

A limitação de taxa controla quantas solicitações um usuário, IP ou chave de API pode fazer em um determinado período. Ela protege seu sistema contra abusos, picos de tráfego acidentais e loops descontrolados.

As estratégias comuns incluem janela fixa, janela deslizante e balde de tokens.

Os limites de taxa geralmente são aplicados no gateway da API ou no balanceador de carga.

Considere-os como freios de segurança que impedem a sobrecarga dos recursos compartilhados.

Padrão de disjuntor

Um disjuntor monitora as chamadas para um serviço remoto e “desativa” o circuito se houver muitas falhas.

Quando aberto, ele rejeita imediatamente novas solicitações em vez de tentar novamente o serviço com problemas.

Após um período de resfriamento, o sistema permite algumas chamadas de teste para verificar se o serviço se recuperou e, caso sejam bem-sucedidas, encerra a execução. Esse padrão evita falhas em cascata, nas quais um serviço lento pode derrubar todo o sistema.

Aqui está a parte complicada. Os disjuntores devem ser ajustados com cuidado para que não abram de forma muito brusca ou muito tarde.

Padrão de antepara

O padrão de anteparo isola partes de um sistema, de modo que uma falha em uma área não comprometa todo o sistema. Isso pode significar pools de conexão separados, pools de threads ou até mesmo clusters de serviços inteiros para diferentes funcionalidades.

Se uma das anteparas estiver congestionada com tráfego, as outras continuam funcionando.

O nome vem das anteparas dos navios, que retêm a água em um compartimento específico.

Em discussões de projeto, o uso de anteparas demonstra que você está considerando o isolamento de falhas e o raio de explosão.

Padrões de Repetição e Recuo Exponencial

As novas tentativas ajudam a recuperar de erros transitórios, como timeouts de rede ou sobrecarga temporária.

O recuo exponencial significa que cada nova tentativa espera mais tempo que a anterior, como 1 segundo, 2 segundos, 4 segundos e assim por diante. Isso impede que seu cliente sobrecarregue um serviço que já está com dificuldades.

Boas políticas de repetição também usam jitter (pequena aleatoriedade) para evitar comportamentos de manada em alta velocidade.

Deixa eu explicar melhor.

Tentativas repetidas sem um período de espera podem piorar as interrupções em vez de ajudar.

Idempotência

Uma operação é idempotente se realizá-la várias vezes tiver o mesmo efeito que realizá-la uma única vez.

Por exemplo, “definir o status do usuário como ativo” é idempotente, enquanto “incrementar o saldo da conta em 10” não é.

A idempotência é crucial quando os sistemas utilizam novas tentativas, pois a mesma solicitação pode ser enviada mais de uma vez.

As APIs geralmente exigem chaves de idempotência em operações como pagamentos para evitar cobranças duplicadas.

Em entrevistas, sempre mencione a idempotência quando falar sobre entrega pelo menos uma vez ou tentativas de fecundação.

Batimento cardíaco

Um batimento cardíaco é um sinal periódico enviado por um serviço ou nó para indicar que está ativo e funcionando corretamente.

Sistemas de monitoramento ou coordenadores escutam os batimentos cardíacos.

Se pararem de recebê-los, eles marcam o nó como inativo e acionam ações de failover ou escalonamento.

Os batimentos cardíacos são ferramentas simples, mas poderosas, para detecção de atividade. Pense neles como as “verificações de pulso” do sistema.

Eleição de Líder (Paxos, Balsa)

A eleição de líder é o processo de escolha de um único nó para atuar como coordenador entre vários.

Algoritmos como Paxos e Raft garantem que apenas um líder seja escolhido e que todos os nós eventualmente concordem sobre quem será esse líder.

O líder lida com tarefas como atribuir trabalho, gerenciar metadados ou ordenar gravações. Se o líder falhar, um novo é eleito automaticamente.

Você não precisa memorizar os cálculos matemáticos para entrevistas, mas deve saber que os algoritmos de consenso são a base de muitos sistemas críticos, como repositórios de metadados e logs distribuídos.

Transações Distribuídas (Padrão SAGA)

Uma transação distribuída abrange vários serviços ou bancos de dados.

O padrão SAGA modela essa transação como uma sequência de etapas locais com ações compensatórias para reversões.

Em vez de bloquear tudo como em uma única transação ACID, cada serviço executa sua parte e publica um evento. Se algo falhar, etapas compensatórias tentam desfazer as alterações anteriores. Isso se encaixa naturalmente com microsserviços e consistência eventual.

A contrapartida é uma lógica mais complexa e a possibilidade de falhas parciais que devem ser tratadas com elegância.

Compromisso de duas fases (2PC)

O Two Phase Commit é um protocolo que tenta fornecer transações atômicas em vários nós.

Na primeira fase, o coordenador pergunta a todos os participantes se eles podem se comprometer.
Na segunda fase, se todos concordarem, o sistema os instrui a confirmar a decisão; caso contrário, instrui-os a reverter a decisão.

O protocolo 2PC oferece fortes garantias, mas pode bloquear se o coordenador falhar, e é caro em grande escala devido ao bloqueio.

Em sistemas de nuvem modernos, o protocolo 2PC é frequentemente evitado em caminhos de alta taxa de transferência e substituído por padrões como o SAGA.

V. Armazenamento em cache e mensagens

Armazenamento em cache

O armazenamento em cache guarda os dados acessados com frequência em uma camada de armazenamento rápida, geralmente na memória, para reduzir a latência e a carga do servidor.

As camadas de cache comuns incluem caches internos ao processo, armazenamentos externos de chave-valor e CDNs. O armazenamento em cache é especialmente eficaz para cargas de trabalho com grande volume de leitura e cálculos dispendiosos.

E aqui está a parte complicada. Dados desatualizados e invalidados tornam o armazenamento em cache mais difícil do que parece à primeira vista.

Como diz o ditado, a invalidação de cache é um dos problemas mais difíceis da ciência da computação.

Estratégias de cache (cache à parte, write-through, etc.)

O termo “cache aside” significa que o aplicativo lê do cache e, em caso de falha, carrega os dados do banco de dados e grava no cache.
O recurso Write-through realiza gravações simultâneas no cache e no banco de dados, garantindo que o cache e a origem estejam sempre sincronizados.
A operação de “write back” primeiro grava no cache e depois descarrega os dados para o banco de dados, o que é rápido, mas arriscado caso o cache falhe.

Cada estratégia equilibra, de forma diferente, inovação, complexidade e desempenho.

Os entrevistadores adoram quando você menciona qual estratégia escolheria para um determinado cenário.

Políticas de remoção de cache (LRU, LFU)

As políticas de remoção de cache decidem quais itens remover quando o cache estiver cheio.

O LRU (Least Recently Used – Menos Recentemente Usado) remove itens que não foram acessados recentemente, partindo do pressuposto de que itens acessados recentemente têm maior probabilidade de serem usados novamente.
LFU (Least Frequently Used – Menos Frequentemente Usado) remove itens que são acessados raramente, priorizando a popularidade a longo prazo.

Alguns sistemas utilizam algoritmos aleatórios, FIFO ou avançados.

A ideia principal é que o espaço em cache é limitado, então você quer manter os itens mais valiosos na memória.

Filas de mensagens (ponto a ponto)

Uma fila de mensagens permite que um componente envie mensagens para outro sem que ambos precisem estar online ao mesmo tempo.

Em um modelo ponto a ponto, as mensagens em uma fila são consumidas por um receptor e, em seguida, removidas. Isso desacopla o remetente e o receptor, permitindo que eles escalem e lidem com falhas de forma independente.

As filas são ótimas para tarefas em segundo plano, envio de e-mails e processamento assíncrono de tarefas pesadas.

Considere-as como uma lista de tarefas compartilhada entre os serviços.

Pub Sub (Publicar e Assinar)

Em um modelo de publicação/ assinatura (pub/sub) , os editores enviam mensagens para tópicos, não diretamente para os consumidores.

Os assinantes ouvem tópicos de seu interesse e recebem cópias de mensagens relevantes. Isso possibilita uma comunicação no estilo de transmissão e uma relação flexível entre produtores e consumidores.

Vários serviços podem reagir ao mesmo evento de maneiras diferentes, como registro de logs, análises e notificações.

Em entrevistas, o conceito de pub/sub aparece frequentemente em projetos orientados a eventos, como feeds de atividades ou event sourcing.

Filas de cartas não entregues

Uma fila de mensagens não processadas armazena mensagens que não puderam ser processadas com sucesso após várias tentativas.

Em vez de ficar tentando indefinidamente e bloqueando a fila principal, essas mensagens são movidas para outro lado.

Os engenheiros podem inspecionar a fila de mensagens não entregues para depurar problemas, corrigir dados ou reproduzir mensagens posteriormente. Esse padrão melhora a resiliência e impede que seu sistema fique preso em “mensagens problemáticas”.

Considere isso como uma área de espera para trabalhos problemáticos.

VI. Observabilidade e Segurança

Rastreamento Distribuído

O rastreamento distribuído acompanha uma única solicitação à medida que ela flui por vários serviços. Cada serviço adiciona um ID de rastreamento e informações de intervalo para que você possa reconstruir o caminho completo de uma solicitação. Isso é extremamente útil ao depurar respostas lentas ou falhas em arquiteturas de microsserviços.

Sem rastreamento, você vê apenas erros isolados. Com ele, você vê o contexto completo, abrangendo serviços, filas e bancos de dados.

SLA vs SLO vs SLI

Um SLA (Acordo de Nível de Serviço) é uma promessa externa feita aos clientes, como por exemplo, “99,9% de tempo de atividade por mês”.

Um SLO (Objetivo de Nível de Serviço) é uma meta interna que os engenheiros buscam atingir, geralmente mais rigorosa que o SLA. Um SLI (Indicador de Nível de Serviço) é a métrica efetivamente medida, como tempo de atividade real ou taxas de sucesso de solicitações.

Considere o SLA como o contrato, o SLO como a meta e o SLI como o placar.

Em entrevistas, o uso correto desses termos demonstra maturidade no pensamento sobre confiabilidade.

OAuth 2.0 e OIDC

OAuth 2.0 é uma estrutura para autorização delegada. Ela permite que os usuários concedam a um aplicativo acesso limitado aos seus recursos sem compartilhar senhas.

O OIDC (OpenID Connect) se baseia no OAuth 2.0 para adicionar autenticação, permitindo que os clientes verifiquem quem é o usuário e obtenham informações de identidade do usuário. Essa é a base de muitos fluxos de “Login com X”.

A ideia principal é que um servidor de autorização emita tokens nos quais clientes e APIs possam confiar.

Aperto de mãos TLS/SSL

TLS/SSL protege a comunicação entre o cliente e o servidor, criptografando os dados em trânsito.

Durante o handshake , o cliente e o servidor concordam com os algoritmos de criptografia, trocam chaves de forma segura e verificam os certificados.

Assim que a autenticação for concluída, todos os dados subsequentes serão criptografados e protegidos contra interceptação. É isso que faz com que o pequeno ícone de cadeado apareça no seu navegador.

Sem o TLS, qualquer pessoa na rede poderia ler ou modificar o tráfego sensível.

Segurança de Confiança Zero

Zero Trust é um modelo de segurança que prega: “Nunca confie, sempre verifique”. Ele parte do princípio de que as ameaças podem existir tanto fora quanto dentro da rede.

Toda solicitação deve ser autenticada, autorizada e criptografada, mesmo que venha de dentro do seu data center ou VPC. O acesso é concedido com base na identidade, postura do dispositivo e contexto, e não apenas por estar “dentro do firewall”.

Nas arquiteturas modernas, o conceito de Zero Trust está se tornando a abordagem padrão para o projeto de sistemas seguros.

Referência

Principais conclusões

O projeto de sistemas consiste principalmente em compreender as compensações envolvidas: consistência versus disponibilidade, latência versus taxa de transferência, simplicidade versus flexibilidade.
Escalar não é simplesmente “adicionar mais servidores”. É preciso pensar em balanceamento de carga, fragmentação, replicação e gargalos.
Padrões de confiabilidade como limitação de taxa, disjuntores, novas tentativas e anteparos existem porque falhas são normais em sistemas distribuídos.
O armazenamento em cache, as filas e o modelo de publicação/assinatura são seus melhores aliados em termos de desempenho e desacoplamento, mas introduzem novos desafios relacionados à consistência e à ordenação.
Conceitos de observabilidade e segurança, como rastreamento, SLIs, OAuth, TLS e Zero Trust, são essenciais para sistemas que não sejam apenas rápidos, mas também seguros e depuráveis.

Fonte: https://designgurus.substack.com/p/50-system-design-concepts-for-beginners

SAP RESTfull integration

February 23, 2026

Luis Fernando Chaim

Inteligência Artificial SAP No Comments

Prompt

Tenho uma aplicação Spring Boot e preciso integrar com SAP, para envio de Produtos e Estoque, que será atualizado no WMS do SAP.
Identifique as etapas, no SAP para disponibilizar uma API RESTful, com segurança e que atenda o conceito de CRUD para entidade Product.
Identifique as etapas na aplicação Spring Boot, para autenticar na API do SAP e enviar um POST para criação de um novo Product.

Enhanced Prompt

“Desenvolva uma integração entre uma aplicação Spring Boot e o SAP para envio de dados de Produtos e Estoque, que serão atualizados no WMS do SAP. Para isso:

No SAP: a. Identifique as etapas necessárias para criar uma API RESTful segura que atenda ao conceito de CRUD (Create, Read, Update, Delete) para a entidade Product. b. Especifique as configurações de segurança necessárias para a API, incluindo autenticação e autorização. c. Detalhe como configurar e implementar a API no SAP, considerando as melhores práticas de segurança e performance.
Na aplicação Spring Boot: a. Descreva as etapas para autenticar na API RESTful do SAP, utilizando um mecanismo de autenticação adequado (por exemplo, OAuth, Basic Auth, etc.). b. Implemente um exemplo de código que demonstre como enviar uma requisição POST para a API do SAP para criar um novo Product, incluindo:
- Configuração do cliente HTTP (por exemplo, RestTemplate ou WebClient).
- Preparação dos dados de Product a serem enviados.
- Tratamento de erros e exceções.
Considere:
- A API do SAP deve ser configurada para utilizar um protocolo de comunicação seguro (HTTPS).
- A aplicação Spring Boot deve lidar com possíveis erros de comunicação, como timeouts e respostas de erro do SAP.
- A implementação deve seguir as melhores práticas de segurança e coding standards tanto no SAP quanto na aplicação Spring Boot.

Forneça detalhes técnicos e exemplos de código quando aplicável, e mantenha a explicação clara e concisa.”

Abacus (Claude Sonnet 4.6)

Integração Spring Boot ↔ SAP (WMS) via API RESTful

🔷 Parte 1 — SAP: Disponibilizando uma API RESTful para Product

1. Pré-requisitos no SAP

SAP S/4HANA ou SAP ECC com SAP Gateway instalado
Módulo WM/EWM ativo para gestão de estoque/WMS
Acesso ao SAP Business Technology Platform (BTP) (opcional, mas recomendado para APIs modernas)

2. Criar o OData Service (API RESTful no SAP)

O SAP expõe APIs via OData (protocolo REST-like), usando o SAP Gateway Framework.

Passo a passo no SAP GUI:

a) Criar o projeto no SAP Gateway

Transação: SEGW (SAP Gateway Service Builder)
→ Criar novo projeto: ZWM_PRODUCT_SRV
→ Definir Entity Type: Product
   - ProductID (Key, String)
   - Description (String)
   - Quantity (Decimal)
   - UnitOfMeasure (String)
   - StorageLocation (String)
→ Definir Entity Set: ProductSet
→ Gerar CRUD: Create, Read, Update, Delete, Query

b) Implementar os métodos CRUD na classe de dados

Transação: SEGW → Reimplementar métodos:
- CREATE_ENTITY  → Cria produto no WMS
- GET_ENTITY     → Busca produto por ID
- GET_ENTITYSET  → Lista todos os produtos
- UPDATE_ENTITY  → Atualiza produto
- DELETE_ENTITY  → Remove produto

c) Ativar e publicar o serviço

Transação: /IWFND/MAINT_SERVICE
→ Add Service → buscar ZWM_PRODUCT_SRV
→ Ativar o serviço
→ URL gerada: https://<sap-host>:<port>/sap/opu/odata/sap/ZWM_PRODUCT_SRV/

3. Segurança na API SAP

a) Autenticação — OAuth 2.0 (recomendado)

SAP BTP → Security → OAuth 2.0 Configuration
→ Criar OAuth Client:
   - Client ID: spring_wms_client
   - Grant Type: Client Credentials
   - Scope: ZWM_PRODUCT_SRV
→ Token URL: https://<btp-host>/oauth/token

b) Autenticação — Basic Auth (alternativa simples)

Transação: SU01
→ Criar usuário técnico: WMS_API_USER
→ Tipo: System (S)
→ Atribuir perfil: Z_WMS_API_PROFILE

c) Autorização — Perfis e Roles

Transação: PFCG
→ Criar Role: Z_WMS_API_ROLE
→ Atribuir objeto de autorização: S_SERVICE
   - SRV_NAME: ZWM_PRODUCT_SRV
   - SRV_TYPE: HT (HTTP Service)
→ Gerar e atribuir ao usuário técnico

d) HTTPS / SSL

Transação: STRUST
→ Importar certificado SSL
→ Ativar HTTPS no ICM (Internet Communication Manager)
Transação: SMICM → HTTPS ativo na porta 443

4. Testar a API no SAP

Transação: /IWFND/GW_CLIENT
→ GET /ProductSet          → lista produtos
→ GET /ProductSet('P001')  → busca por ID
→ POST /ProductSet         → cria produto
→ PUT /ProductSet('P001')  → atualiza
→ DELETE /ProductSet('P001') → remove

🔶 Parte 2 — Spring Boot: Autenticar e enviar POST para SAP

1. Dependências — `pom.xml`

<dependencies>
    <!-- HTTP Client -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-webflux</artifactId>
    </dependency>
    <!-- OAuth2 Client -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-oauth2-client</artifactId>
    </dependency>
    <!-- Lombok -->
    <dependency>
        <groupId>org.projectlombok</groupId>
        <artifactId>lombok</artifactId>
    </dependency>
</dependencies>

2. Configuração — `application.yml`

sap:
  base-url: https://<sap-host>/sap/opu/odata/sap/ZWM_PRODUCT_SRV
  oauth:
    token-url: https://<btp-host>/oauth/token
    client-id: spring_wms_client
    client-secret: your-secret
    scope: ZWM_PRODUCT_SRV
  # Alternativa Basic Auth:
  basic:
    username: WMS_API_USER
    password: your-password

3. Model — `ProductRequest.java`

@Data
@Builder
public class ProductRequest {
    private String productID;
    private String description;
    private BigDecimal quantity;
    private String unitOfMeasure;
    private String storageLocation;
}

4. Serviço OAuth2 — `SapAuthService.java`

@Service
@RequiredArgsConstructor
public class SapAuthService {

    @Value("${sap.oauth.token-url}")
    private String tokenUrl;

    @Value("${sap.oauth.client-id}")
    private String clientId;

    @Value("${sap.oauth.client-secret}")
    private String clientSecret;

    @Value("${sap.oauth.scope}")
    private String scope;

    private final WebClient webClient = WebClient.create();

    public String getAccessToken() {
        return webClient.post()
            .uri(tokenUrl)
            .contentType(MediaType.APPLICATION_FORM_URLENCODED)
            .body(BodyInserters.fromFormData("grant_type", "client_credentials")
                .with("client_id", clientId)
                .with("client_secret", clientSecret)
                .with("scope", scope))
            .retrieve()
            .bodyToMono(JsonNode.class)
            .map(json -> json.get("access_token").asText())
            .block();
    }
}

5. Serviço de Integração — `SapProductService.java`

@Service
@RequiredArgsConstructor
@Slf4j
public class SapProductService {

    @Value("${sap.base-url}")
    private String sapBaseUrl;

    private final SapAuthService authService;
    private final WebClient.Builder webClientBuilder;

    public void createProduct(ProductRequest product) {
        String token = authService.getAccessToken();

        WebClient client = webClientBuilder
            .baseUrl(sapBaseUrl)
            .defaultHeader(HttpHeaders.AUTHORIZATION, "Bearer " + token)
            .defaultHeader(HttpHeaders.CONTENT_TYPE, MediaType.APPLICATION_JSON_VALUE)
            .defaultHeader("Accept", "application/json")
            .build();

        // OData requer X-CSRF-Token — buscar antes do POST
        String csrfToken = fetchCsrfToken(client);

        client.post()
            .uri("/ProductSet")
            .header("X-CSRF-Token", csrfToken)
            .bodyValue(product)
            .retrieve()
            .onStatus(HttpStatusCode::isError, response ->
                response.bodyToMono(String.class)
                    .flatMap(body -> Mono.error(
                        new RuntimeException("SAP API Error: " + body))))
            .bodyToMono(Void.class)
            .doOnSuccess(v -> log.info("Product {} created in SAP WMS", product.getProductID()))
            .block();
    }

    // OData exige CSRF Token para operações de escrita (POST/PUT/DELETE)
    private String fetchCsrfToken(WebClient client) {
        return client.get()
            .uri("/ProductSet?$top=1")
            .header("X-CSRF-Token", "Fetch")
            .retrieve()
            .toBodilessEntity()
            .map(response -> response.getHeaders()
                .getFirst("X-CSRF-Token"))
            .block();
    }
}

6. Controller — `ProductController.java`

@RestController
@RequestMapping("/api/products")
@RequiredArgsConstructor
public class ProductController {

    private final SapProductService sapProductService;

    @PostMapping
    public ResponseEntity<String> createProduct(@RequestBody @Valid ProductRequest request) {
        sapProductService.createProduct(request);
        return ResponseEntity.status(HttpStatus.CREATED)
            .body("Product sent to SAP WMS successfully");
    }
}

🔑 Pontos de Atenção

Tópico	Detalhe
X-CSRF-Token	Obrigatório em toda escrita OData SAP — sempre buscar antes do POST
OAuth vs Basic	OAuth 2.0 é recomendado para produção; Basic Auth apenas para dev/teste
OData vs REST puro	SAP Gateway usa OData; para REST puro considere SAP Integration Suite
SAP BTP	Ideal para expor APIs modernas com segurança gerenciada
Certificado SSL	Em produção, configurar `SSLContext` no `WebClient` com o certificado SAP
Retry/Circuit Breaker	Adicionar Resilience4j para tolerância a falhas na integração

Adapta

Visão geral

Abaixo está um guia técnico completo para integrar uma aplicação Spring Boot com o SAP para envio de dados de Produto (e, por extensão, Estoque), visando atualização no WMS do SAP. O foco recai em:

Expor no SAP uma API RESTful (tipicamente OData) segura, com CRUD para Product.
Consumir essa API a partir do Spring Boot com boas práticas de segurança, robustez e performance.
Exemplos práticos de configuração e código.

Observação importante: quando possível, avalie usar uma API padrão do SAP (ex.: Product Master) em vez de construir uma API customizada. Se não houver API padrão que atenda, use RAP (ABAP RESTful Application Programming Model) para criar uma API customizada OData V4. Os passos abaixo contemplam ambos os cenários.

1) No SAP

1.a) Etapas para criar uma API RESTful CRUD (entidade Product)

Existem duas abordagens principais:

Usar API padrão (recomendado quando existir)

Exemplo: S/4HANA tem a API “Product Master” (serviço OData v2), com entidade A_Product, que já provê leitura e, em alguns cenários, criação/atualização dependendo da edição do S/4HANA e autorização.
Passos:
- Ativar/checar o serviço no SAP Gateway (on‑prem: transação /IWFND/MAINT_SERVICE; cloud: Communication Arrangement).
- Publicar o endpoint e garantir que o canal HTTPS esteja ativo.
- Criar usuários de comunicação (Cloud) ou usuários técnicos (on‑prem), papéis e perfis autorizativos.
- Configurar autenticação (OAuth 2.0 Client Credentials em Cloud; Basic/OAuth em on‑prem) e escopos.
- Testar via SAP API Hub ou via o próprio endpoint OData.

Criar API customizada com ABAP RAP (OData V4)

Passos no ADT (ABAP Development Tools no Eclipse):
1. Modelagem de Dados (CDS):
  - Criar CDS View Entity Z_I_Product com chaves e campos de negócio (ProductId, SKU, Name, Description, Unit, Price, WarehouseId etc.).
  - Definir anotação @AccessControl para controle de acesso (se usar DCL).
2. Comportamento (Behavior Definition/Implementation):
  - Criar Behavior Definition (BDEF) para habilitar create, update, delete e read.
  - Implementar Behavior Implementation (classe ABAP) para regras de negócio, validações e persistência.
3. Service Definition:
  - Definir a Service Definition incluindo a entidade Product (ex.: expose Z_I_Product as Products).
4. Service Binding:
  - Criar Service Binding (OData V4 UI/Service). Publicar o serviço e capturar o caminho do endpoint.
5. Ativação e Testes:
  - Ativar objetos, usar o preview do ADT para inspecionar a $metadata e testar operações CRUD.
6. Segurança e Autorização:
  - Definir objetos de autorização (PFCG) e perfis.
  - Amarrar escopos/roles à exposição do serviço (IAM em Cloud; PFCG/roles em on‑prem).
7. Gateway/ICF:
  - Garantir que o serviço está acessível via ICF (SICF) e que HTTPS está ativo (STRUST/ICM).

Dica: Se o WMS estiver em SAP EWM e você realmente precisa atualizar estoques, verifique APIs padrão de estoque/warehouse tasks. Caso não cubram o fluxo, crie também uma entidade de “Inventory/Stock” via RAP ou componha via API Management.

1.b) Configurações de segurança (autenticação e autorização)

Protocolo:
- Somente HTTPS (TLS 1.2+). Desabilite cifras fracas. Em on‑prem, configure em STRUST/ICM. Em Cloud, por padrão é HTTPS.
Autenticação:
- SAP S/4HANA Cloud e BTP: preferencialmente OAuth 2.0 Client Credentials (máquina‑para‑máquina).
- On‑prem (Gateway): pode-se usar Basic Auth sobre HTTPS ou OAuth 2.0 via SAP Identity Authentication/IdP.
Autorização:
- Definir roles e escopos: “Product.Read”, “Product.Write”, “Product.Delete” etc.
- Atribuir papéis a usuários técnicos (on‑prem) ou a “Communication Users/Arrangements” (Cloud).
CSRF:
- Para operações de escrita (POST/PUT/PATCH/DELETE) em OData, use X-CSRF-Token.
- Fluxo: primeiro requisitar token com header X-CSRF-Token: Fetch; depois enviar a mutação com header X-CSRF-Token: e o cookie de sessão retornado.
Proteções adicionais:
- Rate limiting e quotas via SAP API Management (se expor pela camada de API).
- IP allowlist.
- Logging/auditoria e mascaramento de dados sensíveis.
- CORS conforme necessário (se front-ends externos consumirem a API).
- Conteúdo: aceite e responda application/json; rejeite formatos inesperados.

1.c) Configurar e implementar a API no SAP com segurança e performance

Performance (RAP/OData):
- Use $select para reduzir colunas devolvidas.
- Paginação ($top/$skip).
- ETags e If-Match para controle concorrente e caching.
- $batch para operações em lote quando precisar criar/atualizar muitos registros.
- Pushdown no banco: CDS, AMDP quando necessário, índices nas tabelas.
- GZIP habilitado no ICM (compressão HTTP).
Hardening:
- Validar payloads no Behavior Implementation (tamanhos máximos, formatos, ranges).
- Sanitizar entradas (evitar injeções em consultas dinâmicas).
- Separar usuários de leitura e escrita (princípio do menor privilégio).
Observabilidade:
- Ativar logs de acesso/erros no ICM/Gateway.
- Correlação: suportar cabeçalho X-Correlation-Id (propagar a partir do cliente).
Publicação do endpoint:
- Exemplo genérico de endpoint OData V4 (RAP):
  - GET/POST: https:///sap/opu/odata4////Products
  - PATCH/DELETE/GET por chave: https:///…/Products(”)
- OData V2 (API padrão, ex. Product Master):
  - GET/POST: https:///sap/opu/odata/sap/API_PRODUCT_SRV/A_Product

Links úteis:

ABAP RAP Overview {target=”_blank”}: https://help.sap.com/docs/abap-cloud/abap-rap
SAP API Business Hub {target=”_blank”}: https://api.sap.com/

2) Na aplicação Spring Boot

2.a) Etapas para autenticar na API do SAP

Escolha 1: OAuth 2.0 Client Credentials (recomendado em Cloud/BTP/S/4HANA Cloud ou via API Management)

Fluxo:
- Obter access_token no token endpoint (client_id, client_secret).
- Incluir Authorization: Bearer em todas as chamadas à API.
Benefícios: rotação de segredos, escopos, revogação centralizada.

Escolha 2: Basic Auth (possível em on‑prem)

Fluxo:
- Enviar Authorization: Basic <base64(user:password)> sobre HTTPS.
Notas: Menos flexível; use contas técnicas com privilégios mínimos, segredo guardado em cofre.

CSRF (independente do método de autenticação):

Para POST/PUT/PATCH/DELETE em OData, requisitar token:
- Enviar uma chamada preliminar (ex.: GET em $metadata ou HEAD) com X-CSRF-Token: Fetch.
- Reutilizar o token X-CSRF-Token e o cookie de sessão (ex.: SAP_SESSIONID) na chamada de escrita.

TLS:

Importe o certificado do servidor SAP em um truststore (JKS/P12).
Configure o WebClient/HTTP client para confiar apenas no truststore (não desative validações!).

Timeouts, retries e resiliência:

Timeouts de conexão, leitura e escrita.
Retry com backoff para 429/503/504 (não retentar em 4xx de validação).
Circuit breaker e bulkhead se necessário.

2.b) Exemplo de código (Spring Boot) – POST para criar Product

A seguir, um exemplo com:

WebClient (preferido) com timeouts.
OAuth2 Client Credentials.
Fluxo de CSRF para OData.
Tratamento de erros com parsing do erro OData.
Alternativa com Basic Auth (comentada).

Estrutura minimalista:

application.yml
DTOs: ProductRequest, SapODataError
Configuração: WebClientConfig (inclui SSL, OAuth, timeouts)
Cliente: SapProductClient (getCsrfToken + createProduct)
Exemplo de uso no Service/Controller

application.yml (exemplo)

server:
  port: 8080

sap:
  base-url: "https://sap.example.com/sap/opu/odata4/zns/product/0001"
  product-path: "/Products"
  # Para OData v2 (API padrão), algo como:
  # base-url: "https://sap.example.com/sap/opu/odata/sap/API_PRODUCT_SRV"
  # product-path: "/A_Product"

security:
  oauth2:
    client:
      sap:
        token-uri: "https://sap-iam.example.com/oauth/token"
        client-id: "${SAP_CLIENT_ID}"
        client-secret: "${SAP_CLIENT_SECRET}"
        scope: "Product.Write"

http:
  connect-timeout-ms: 5000
  read-timeout-ms: 10000
  write-timeout-ms: 10000

ssl:
  enabled: true
  truststore:
    path: "classpath:truststore/sap-truststore.p12"
    password: "${TRUSTSTORE_PASSWORD}"
    type: "PKCS12"

Coloque os segredos (client-secret, senhas) em variáveis de ambiente/cofre.

DTOs

// Produto enviado para criação no SAP (ajuste campos aos do serviço SAP real)
public record ProductRequest(
    String ProductId,
    String SKU,
    String Name,
    String Description,
    String Unit,
    String WarehouseId,
    java.math.BigDecimal Price,
    java.time.OffsetDateTime LastUpdated
) {}

// Erro OData típico (simplificado)
public class SapODataError {
  public Error error;
  public static class Error {
    public String code;
    public Message message;
  }
  public static class Message {
    public String lang;
    public String value;
  }
}

Observação: No serviço padrão API_PRODUCT_SRV (OData v2), o payload e campos são diferentes (ex.: Product, ProductType, BaseUnit, ProductGroup etc.). Adeque o DTO aos metadados ($metadata) do seu endpoint.

Configurando WebClient com OAuth2, SSL e timeouts

import io.netty.handler.ssl.SslContextBuilder;
import io.netty.handler.ssl.util.InsecureTrustManagerFactory;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.http.client.reactive.ReactorClientHttpConnector;
import org.springframework.util.StringUtils;
import org.springframework.web.reactive.function.client.ExchangeFilterFunction;
import org.springframework.web.reactive.function.client.WebClient;
import reactor.netty.http.client.HttpClient;
import reactor.netty.tcp.TcpClient;

import javax.net.ssl.KeyManagerFactory;
import javax.net.ssl.TrustManagerFactory;
import java.io.InputStream;
import java.security.KeyStore;
import java.time.Duration;
import java.util.Map;

@Configuration
public class WebClientConfig {

  @Value("${sap.base-url}")
  private String sapBaseUrl;

  @Value("${http.connect-timeout-ms:5000}")
  private int connectTimeoutMs;

  @Value("${http.read-timeout-ms:10000}")
  private int readTimeoutMs;

  @Value("${http.write-timeout-ms:10000}")
  private int writeTimeoutMs;

  @Value("${ssl.enabled:true}")
  private boolean sslEnabled;

  @Value("${ssl.truststore.path:}")
  private String truststorePath;

  @Value("${ssl.truststore.password:}")
  private String truststorePassword;

  @Value("${ssl.truststore.type:PKCS12}")
  private String truststoreType;

  @Bean
  public WebClient sapWebClient(WebClient.Builder builder) throws Exception {
    HttpClient httpClient = HttpClient.create()
        .responseTimeout(Duration.ofMillis(readTimeoutMs));

    if (sslEnabled) {
      httpClient = httpClient.secure(sslContextSpec -> {
        try {
          if (StringUtils.hasText(truststorePath)) {
            KeyStore ts = KeyStore.getInstance(truststoreType);
            try (InputStream is = getClass().getResourceAsStream(truststorePath.startsWith("classpath:") ?
                truststorePath.replace("classpath:", "/") : truststorePath)) {
              ts.load(is, truststorePassword.toCharArray());
            }
            TrustManagerFactory tmf = TrustManagerFactory.getInstance(TrustManagerFactory.getDefaultAlgorithm());
            tmf.init(ts);
            sslContextSpec.sslContext(SslContextBuilder.forClient().trustManager(tmf).build());
          } else {
            // Evite isto em produção; aqui só como fallback ilustrativo
            sslContextSpec.sslContext(SslContextBuilder.forClient()
                .trustManager(InsecureTrustManagerFactory.INSTANCE).build());
          }
        } catch (Exception e) {
          throw new RuntimeException("Erro configurando SSL", e);
        }
      });
    }

    return builder
        .baseUrl(sapBaseUrl)
        .clientConnector(new ReactorClientHttpConnector(httpClient))
        .defaultHeader("Accept", "application/json")
        .filter(loggingFilter())
        .build();
  }

  private ExchangeFilterFunction loggingFilter() {
    return ExchangeFilterFunction.ofRequestProcessor(clientRequest -> {
      // Log enxuto, sem segredos
      System.out.println("SAP Request: " + clientRequest.method() + " " + clientRequest.url());
      return reactor.core.publisher.Mono.just(clientRequest);
    });
  }
}

Autenticação OAuth2: podemos obter o token com um pequeno TokenClient e aplicar um ExchangeFilterFunction para adicionar o header Authorization.

import org.springframework.beans.factory.annotation.Value;
import org.springframework.http.MediaType;
import org.springframework.stereotype.Component;
import org.springframework.web.reactive.function.client.WebClient;

import java.time.Instant;
import java.util.Map;

@Component
public class OAuthTokenClient {

  private final WebClient http;
  private final String tokenUri;
  private final String clientId;
  private final String clientSecret;
  private final String scope;

  private volatile String cachedToken;
  private volatile Instant expiresAt = Instant.EPOCH;

  public OAuthTokenClient(
      WebClient.Builder builder,
      @Value("${security.oauth2.client.sap.token-uri}") String tokenUri,
      @Value("${security.oauth2.client.sap.client-id}") String clientId,
      @Value("${security.oauth2.client.sap.client-secret}") String clientSecret,
      @Value("${security.oauth2.client.sap.scope:}") String scope
  ) {
    this.http = builder.baseUrl(tokenUri).build();
    this.tokenUri = tokenUri;
    this.clientId = clientId;
    this.clientSecret = clientSecret;
    this.scope = scope;
  }

  public synchronized String getAccessToken() {
    if (cachedToken != null && Instant.now().isBefore(expiresAt.minusSeconds(30))) {
      return cachedToken;
    }
    var bodySpec = http.post()
        .contentType(MediaType.APPLICATION_FORM_URLENCODED)
        .bodyValue("grant_type=client_credentials&client_id=" + clientId +
            "&client_secret=" + clientSecret +
            (scope != null && !scope.isBlank() ? "&scope=" + scope : ""));

    Map<?, ?> token = bodySpec.retrieve()
        .bodyToMono(Map.class)
        .block();

    String accessToken = (String) token.get("access_token");
    Number expiresIn = (Number) token.get("expires_in");
    cachedToken = accessToken;
    expiresAt = Instant.now().plusSeconds(expiresIn != null ? expiresIn.longValue() : 300);
    return cachedToken;
  }
}

Filter para aplicar o Bearer Token:

import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.web.reactive.function.client.ExchangeFilterFunction;

@Configuration
public class OAuthFilterConfig {

  @Bean
  public ExchangeFilterFunction oauthBearerFilter(OAuthTokenClient tokenClient) {
    return ExchangeFilterFunction.ofRequestProcessor(req -> {
      return reactor.core.publisher.Mono.just(
          ClientRequest.from(req)
              .headers(h -> h.setBearerAuth(tokenClient.getAccessToken()))
              .build()
      );
    });
  }
}

E injete esse filtro no WebClient (no bean sapWebClient), por exemplo:

// dentro de sapWebClient(...)
return builder
    .baseUrl(sapBaseUrl)
    .clientConnector(new ReactorClientHttpConnector(httpClient))
    .defaultHeader("Accept", "application/json")
    .filter(oauthBearerFilter) // injete esse bean aqui
    .filter(loggingFilter())
    .build();

Alternativa Basic Auth (se optar):

Em vez do filter acima, use .defaultHeaders(h -> h.setBasicAuth(user, password)) no WebClient.

Cliente para SAP: obtendo CSRF e criando Product

import org.springframework.beans.factory.annotation.Value;
import org.springframework.http.HttpHeaders;
import org.springframework.http.HttpStatus;
import org.springframework.http.MediaType;
import org.springframework.http.ResponseCookie;
import org.springframework.stereotype.Component;
import org.springframework.web.reactive.function.BodyInserters;
import org.springframework.web.reactive.function.client.ClientResponse;
import org.springframework.web.reactive.function.client.WebClient;
import reactor.core.publisher.Mono;

import java.net.HttpCookie;
import java.util.List;
import java.util.Map;
import java.util.Optional;

@Component
public class SapProductClient {

  private final WebClient sapWebClient;
  private final String productPath;

  public SapProductClient(
      WebClient sapWebClient,
      @Value("${sap.product-path}") String productPath
  ) {
    this.sapWebClient = sapWebClient;
    this.productPath = productPath;
  }

  // 1) Obter X-CSRF-Token e cookies de sessão
  public CsrfSession fetchCsrf() {
    ClientResponse resp = sapWebClient.get()
        .uri(uriBuilder -> uriBuilder
            // Para OData V4: chamar $metadata ou a entidade com $top=1
            .path("/$metadata")
            .build()
        )
        .header("X-CSRF-Token", "Fetch")
        .exchangeToMono(Mono::just)
        .block();

    if (resp == null) {
      throw new RuntimeException("Sem resposta do SAP ao requisitar CSRF");
    }

    String token = resp.headers().asHttpHeaders().getFirst("X-CSRF-Token");
    List<String> setCookies = resp.headers().asHttpHeaders().getOrDefault(HttpHeaders.SET_COOKIE, List.of());
    if (token == null || setCookies.isEmpty()) {
      // Alguns endpoints retornam o token apenas em mutação; em OData geralmente funciona no $metadata
      throw new RuntimeException("Não foi possível obter CSRF Token e/ou cookies de sessão do SAP");
    }

    // Coletar cookies para reenviar
    String cookieHeader = String.join("; ",
        setCookies.stream()
            .map(ResponseCookie::fromClientResponse)
            .map(Object::toString)
            .toList()
    );

    return new CsrfSession(token, cookieHeader);
  }

  // 2) Criar Product (POST)
  public Map<?, ?> createProduct(ProductRequest product) {
    CsrfSession csrf = fetchCsrf();

    ClientResponse resp = sapWebClient.post()
        .uri(productPath) // ex.: "/Products" ou "/A_Product"
        .contentType(MediaType.APPLICATION_JSON)
        .header("X-CSRF-Token", csrf.token())
        .header(HttpHeaders.COOKIE, csrf.cookieHeader())
        .header("X-Correlation-Id", Correlation.id()) // se você usa correlação
        .body(BodyInserters.fromValue(product))
        .exchangeToMono(Mono::just)
        .block();

    if (resp == null) {
      throw new RuntimeException("Sem resposta do SAP ao criar Product");
    }

    if (resp.statusCode().is2xxSuccessful()) {
      return resp.bodyToMono(Map.class).block();
    }

    // Tratamento de erro OData
    String raw = resp.bodyToMono(String.class).block();
    SapODataError err = null;
    try {
      err = JsonUtils.fromJson(raw, SapODataError.class); // implemente um util com Jackson
    } catch (Exception ignore) {}

    String msg = Optional.ofNullable(err)
        .map(e -> e.error != null && e.error.message != null ? e.error.message.value : null)
        .orElse(raw);

    if (resp.statusCode().value() == 409) {
      throw new ConflictException("Conflito ao criar Product: " + msg);
    } else if (resp.statusCode().is4xxClientError()) {
      throw new ClientErrorException("Erro 4xx do SAP: " + msg);
    } else if (resp.statusCode().is5xxServerError()) {
      throw new ServerErrorException("Erro 5xx do SAP: " + msg);
    } else {
      throw new RuntimeException("Erro inesperado do SAP (" + resp.statusCode().value() + "): " + msg);
    }
  }

  public record CsrfSession(String token, String cookieHeader) {}
}

Utilitário simples para JSON (Jackson):

import com.fasterxml.jackson.databind.ObjectMapper;

public class JsonUtils {
  private static final ObjectMapper MAPPER = new ObjectMapper();
  public static <T> T fromJson(String json, Class<T> type) {
    try {
      return MAPPER.readValue(json, type);
    } catch (Exception e) {
      throw new RuntimeException("Falha ao parsear JSON", e);
    }
  }
}

Correlation ID (opcional):

import java.util.UUID;

public class Correlation {
  private static final ThreadLocal<String> ID = new ThreadLocal<>();

  public static String id() {
    String v = ID.get();
    if (v == null) {
      v = UUID.randomUUID().toString();
      ID.set(v);
    }
    return v;
  }

  public static void clear() {
    ID.remove();
  }
}

Exemplo de uso em um Service/Controller:

import org.springframework.stereotype.Service;

@Service
public class ProductService {

  private final SapProductClient sapProductClient;

  public ProductService(SapProductClient sapProductClient) {
    this.sapProductClient = sapProductClient;
  }

  public Map<?, ?> createProduct(ProductRequest req) {
    // Validações locais
    if (req.ProductId() == null || req.ProductId().isBlank()) {
      throw new IllegalArgumentException("ProductId é obrigatório");
    }
    // Idempotência (opcional): consultar antes; ou usar chave natural como ProductId
    return sapProductClient.createProduct(req);
  }
}

Tratamento de timeouts e retries:

Você pode envolver as chamadas com Resilience4j:

<!-- pom.xml -->
<dependency>
  <groupId>io.github.resilience4j</groupId>
  <artifactId>resilience4j-spring-boot3</artifactId>
  <version>2.2.0</version>
</dependency>

// Exemplo: retry apenas para 503/504/429
import io.github.resilience4j.retry.annotation.Retry;
import org.springframework.web.reactive.function.client.WebClientResponseException;

@Retry(name = "sapCreateProduct")
public Map<?, ?> createProductWithRetry(ProductRequest req) {
  try {
    return sapProductClient.createProduct(req);
  } catch (WebClientResponseException.TooManyRequests |
           WebClientResponseException.ServiceUnavailable |
           WebClientResponseException.GatewayTimeout e) {
    throw e; // Retentável
  } catch (Exception e) {
    throw e; // Não retentar por padrão
  }
}

application.yml (Resilience4j):

resilience4j:
  retry:
    instances:
      sapCreateProduct:
        max-attempts: 3
        wait-duration: 1s
        retry-exceptions:
          - org.springframework.web.reactive.function.client.WebClientResponseException$TooManyRequests
          - org.springframework.web.reactive.function.client.WebClientResponseException$ServiceUnavailable
          - org.springframework.web.reactive.function.client.WebClientResponseException$GatewayTimeout

Observações sobre OData e CSRF:

Alguns proxies/API Management podem gerenciar CSRF; caso o endpoint finalize a necessidade do token, remova a etapa de fetch. Em geral, serviços OData no SAP exigem X-CSRF-Token para métodos mutáveis.
Para OData v2, o caminho do metadata é /$metadata sob o serviço; para v4 (RAP) também.

Alternativa com RestTemplate (se preferir, embora seja legado):

RestTemplate rest = new RestTemplateBuilder()
    .setConnectTimeout(Duration.ofMillis(5000))
    .setReadTimeout(Duration.ofMillis(10000))
    .build();
// Obtenha CSRF via exchange() e reenvie cookies + header nas mutações

3) Considerações e melhores práticas

HTTPS obrigatório:
- Verifique o handshake TLS e a cadeia de certificação em homologação antes da produção.
Padrão de payloads e tipos:
- Use application/json; verifique se o serviço aceita e retorna JSON (OData v2/v4 normalmente suportam).
Tratamento de erro do SAP:
- OData retorna um envelope com “error.code” e “error.message.value”.
- Mapeie 400/404/409, 5xx. Não retentar 4xx de validação.
Idempotência:
- Para criação, considere:
  - Usar chave natural (ProductId) e tratar 409 (conflito) como já criado.
  - Ou usar um header de idempotência (se exposto via API Management) com deduplicação no SAP.
Concorrência:
- Utilize ETag/If-Match em updates (PATCH) para evitar sobregravações.
Performance no cliente:
- Conexões persistentes (HTTP/1.1 keep-alive ou HTTP/2, se suportado).
- GZIP habilitado.
- Batch ($batch) para grandes volumes.
Observabilidade:
- Logue o X-Correlation-Id em ambos os lados para rastrear fluxos.
- Métricas e alertas (latência, taxa de erro, timeouts).
Segurança de segredos:
- Nunca comitar segredos. Use Vault/Secrets Manager/KMS.
- Rotacione credenciais periodicamente.
Padrões de código:
- DTOs imutáveis, validações com Bean Validation.
- Camadas separadas (Controller/Service/Client).
- Testes integrados contra sandbox/API Hub quando disponível.

Exemplos de chamadas CRUD (genéricos)

Supondo RAP OData V4, base-url: https://sap.example.com/sap/opu/odata4/zns/product/0001

Create (POST):
- POST /Products
- Headers: Authorization: Bearer , X-CSRF-Token: , Cookie: <sessão>, Content-Type: application/json
- Body (exemplo adaptado ao seu metadado): { “ProductId”: “PRD-001”, “SKU”: “SKU-001”, “Name”: “Produto X”, “Description”: “Descrição”, “Unit”: “EA”, “WarehouseId”: “WH01”, “Price”: 99.90, “LastUpdated”: “2026-02-20T10:00:00Z” }
Read (GET):
- GET /Products(‘PRD-001’)?$select=ProductId,Name,SKU
- GET /Products?$filter=WarehouseId eq ‘WH01’&$top=50&$skip=0
Update (PATCH):
- PATCH /Products(‘PRD-001’)
- Headers: X-CSRF-Token, If-Match:
- Body: { “Price”: 109.90 }
Delete (DELETE):
- DELETE /Products(‘PRD-001’)
- Headers: X-CSRF-Token, If-Match:

Para OData V2 (API_PRODUCT_SRV), adapte endpoints e nomes de campos conforme a $metadata.

Estoque (WMS) – Nota rápida

Se você também precisa atualizar o estoque, verifique se há API padrão do EWM (por exemplo, para Quantidades, Warehouse Tasks, Stock Change). Caso contrário, crie um serviço RAP para uma entidade Inventory/Stock (chaves como ProductId + WarehouseId + StorageBin etc.), com as mesmas diretrizes de segurança e performance.
Consistência entre Produto e Estoque:
- Se forem dois endpoints, trate a consistência via “saga” (compensações) ou use $batch transacional se o serviço suportar.
- Idempotência é crítica para eventos de estoque (evitar contagens duplicadas).

Checklist rápido

SAP
- Usar API padrão (ex.: Product Master) OU publicar RAP OData V4 para Product/Inventory.
- Habilitar HTTPS com TLS 1.2+ e cifras fortes.
- Configurar autenticação (OAuth 2.0 Client Credentials preferencial).
- Definir roles/escopos e autorização por operação.
- Habilitar/usar X-CSRF-Token para métodos mutáveis.
- Performance: $select/$filter/paginação/$batch/GZIP, ETags.
- Logs/auditoria, correlacionar requisições.
Spring Boot
- WebClient com timeouts, TLS truststore configurado.
- OAuth2 Token Client ou Basic Auth.
- Fluxo CSRF: fetch + reuso de token e cookies.
- Tratamento de erros OData (parse de “error.message.value”).
- Retries com backoff para 429/503/504; não retentar 4xx de validação.
- Observabilidade, correlação, segurança de segredos.

Conclusão

No SAP, prefira APIs padrão. Se não houver, crie um serviço OData V4 com RAP, com CRUD completo para Product, segurança via OAuth 2.0, HTTPS, CSRF, roles/escopos, e boas práticas de performance (ETags, $select, paginação).
No Spring Boot, use WebClient com OAuth2 Client Credentials, configure SSL com truststore, implemente o fluxo de CSRF, trate erros OData adequadamente, aplique timeouts/retries com backoff e mantenha logs e correlações.
Para Estoque/WMS, avalie APIs padrão do EWM; na ausência, modele e exponha uma entidade Inventory/Stock com os mesmos padrões.

Se quiser, posso:

Validar a $metadata do seu endpoint para ajustar exatamente o DTO e os campos.
Fornecer um exemplo usando a API padrão API_PRODUCT_SRV com campos reais.
Montar um fluxo $batch para criação em massa com idempotência.

DotNet WhenAll LoadAsync

February 6, 2026

Luis Fernando Chaim

DotNet No Comments

Prompt

Como evitar task.whenall com multiplos loadasync, utilizado com DbContext?

Google Modo IA

Para o Entity Framework Core, o uso de Task.WhenAll com o mesmo DbContext resultará no erro: “A second operation started on this context before a previous operation completed”. Isso ocorre porque o DbContext não é thread-safe e não suporta operações paralelas simultâneas na mesma instância.

Para resolver isso e evitar o Task.WhenAll problemático, você tem três caminhos principais:

1. Utilize IDbContextFactory (Recomendado para Paralelismo)

Se você precisa carregar dados em paralelo para ganhar performance, a solução oficial é injetar o IDbContextFactory. Isso permite criar uma instância isolada do contexto para cada tarefa.

csharp

// No construtor: private readonly IDbContextFactory<MyDbContext> _contextFactory;

public async Task LoadDataInParallel()
{
    var task1 = Task.Run(async () => {
        using var context = _contextFactory.CreateDbContext();
        return await context.Users.ToListAsync();
    });

    var task2 = Task.Run(async () => {
        using var context = _contextFactory.CreateDbContext();
        return await context.Orders.ToListAsync();
    });

    await Task.WhenAll(task1, task2); // Agora é seguro pois cada Task tem seu próprio Context
}

Use o código com cuidado.

2. Execução Sequencial (Simples e Segura)

Se a performance de carregar tudo ao mesmo tempo não for crítica, a forma mais segura de evitar o erro é simplesmente aguardar cada chamada individualmente. Isso garante que apenas uma operação utilize a conexão por vez.

csharp

var users = await _context.Users.ToListAsync();
var orders = await _context.Orders.ToListAsync(); // Só inicia após o término da anterior

Use o código com cuidado.

3. Projeção Única (Query Splitting ou Joins)

Em vez de múltiplos LoadAsync, tente consolidar as buscas em uma única consulta ao banco ou utilize o AsSplitQuery() se o problema for o carregamento de muitas coleções relacionadas (Avoid Cartesian Explosion).

Resumo de quando usar cada um:

IDbContextFactory: Use quando o paralelismo é indispensável (ex: Dashboards complexos ou Blazor Server).
Await sequencial: Use em fluxos simples de CRUD onde a latência do banco é baixa.
Split Queries: Use se os múltiplos “loads” forem para carregar propriedades de navegação de uma mesma entidade.

Você está enfrentando esse problema em uma aplicação Web API comum ou em um cenário de Background Service/Blazor?

Prompt

Como evitar task.whenall com multiplos loadasync?

Google Modo IA

Para evitar o uso de Task.WhenAll ao lidar com múltiplos LoadAsync, você pode adotar diferentes padrões dependendo do seu objetivo (performance, controle de concorrência ou ordem de execução):

1. Processamento Conforme Conclusão (Task.WhenAny)

Se você não quer esperar todos terminarem para começar a processar os resultados, use o padrão de remover tarefas de uma lista conforme elas finalizam. Isso melhora a responsividade.

csharp

var tasks = new List<Task<Data>> { LoadAsync(1), LoadAsync(2), LoadAsync(3) };
while (tasks.Count > 0)
{
    var finishedTask = await Task.WhenAny(tasks);
    tasks.Remove(finishedTask);
    var data = await finishedTask; // Processa assim que cada um termina
}

Use o código com cuidado.

2. Controle de Concorrência (SemaphoreSlim)

Para evitar sobrecarregar o sistema com muitos LoadAsync simultâneos (o que o WhenAll faz por padrão), utilize um SemaphoreSlim para limitar quantas tarefas rodam ao mesmo tempo.

csharp

var semaphore = new SemaphoreSlim(3); // Limite de 3 simultâneos
var tasks = items.Select(async item => {
    await semaphore.WaitAsync();
    try { return await LoadAsync(item); }
    finally { semaphore.Release(); }
});
await Task.WhenAll(tasks); // Ainda usa WhenAll, mas com execução controlada

Use o código com cuidado.

3. Iteração Assíncrona Moderna (Parallel.ForEachAsync)

No .NET 6+, a melhor alternativa ao WhenAll para coleções é o Parallel.ForEachAsync. Ele gerencia automaticamente o grau de paralelismo e é mais eficiente para processar grandes volumes de dados.

csharp

await Parallel.ForEachAsync(items, async (item, token) => {
    await LoadAsync(item); // Execução paralela controlada nativamente
});

Use o código com cuidado.

4. Execução Sequencial (Simples await)

Se os carregamentos dependem um do outro ou se a ordem é crítica, abandone o paralelismo e use o await individual. Isso evita a complexidade de gerenciar múltiplas tarefas.

csharp

var res1 = await LoadAsync(1);
var res2 = await LoadAsync(2); // Só inicia após o primeiro terminar

Use o código com cuidado.

Qual é o cenário específico dos seus LoadAsync (ex: muitas chamadas de API, leitura de arquivos ou carregamento de UI)?

Análise código .Net

February 5, 2026

Luis Fernando Chaim

DotNet No Comments

Consigo sim fazer essa avaliação — mas eu não consigo acessar diretamente um caminho local do seu Windows (C:\Users\...) a partir daqui. Para eu analisar o projeto de verdade (código, csproj, camadas, DbContext, queries, etc.), você tem 3 opções:

Enviar um ZIP do repositório (ideal)
Enviar somente partes relevantes (pasta de Data/Infra, DbContext, Repositories/Services, Controllers/Handlers, principais Entities e Migrations)
Rodar análises automatizadas aí e me mandar os relatórios (Sonar/Analyzers/CodeMetrics), e eu interpreto e devolvo um parecer com recomendações

✅ Dica de privacidade: antes de enviar, remova/mascare secrets, connection strings, chaves, certificados e qualquer dado sensível (appsettings.* / user-secrets / pipelines).

O que eu vou entregar (quando você enviar o código ou os relatórios)

Um relatório estruturado com:

Misuse de Entity Framework (anti‑patterns + impacto: performance, concorrência, consistência, manutenção)
Violação de SOLID (por princípio, com exemplos e refatorações sugeridas)
Clean Code (legibilidade, complexidade, duplicação, acoplamento, testes, logs, exceptions)
Severidade e priorização: Quick wins (1–2 dias), Médio prazo (1–2 sprints), Estrutural (refatoração maior)
Trechos de código anotados (antes/depois) quando aplicável

Para começar: me diga 3 coisas rápidas

O projeto usa EF Core ou EF6? (e versão do .NET: 6/7/8?)
A arquitetura é Clean Architecture, DDD, N‑tier, CQRS/MediatR…?
O problema percebido hoje é mais performance, bugs, dificuldade de evoluir, ou tudo isso?

Se você responder isso, eu já ajusto o foco da análise.

Checklist do que eu vou procurar (e como vou marcar “uso indevido”)

1) Entity Framework — Principais “misuses” / anti‑patterns

A. Performance e consultas

N+1 queries (loop que dispara query por item)\ Sinais: .ToList() cedo demais + depois acessando navegações; lazy loading sem controle; foreach com FirstOrDefault/Single no DbSet.
Client evaluation / materialização precoce\ Sinais: .AsEnumerable() cedo; projeção depois do materialize; ToList() antes de Where/Select.
Tracking desnecessário em leituras\ Sinais: queries de leitura sem .AsNoTracking() (quando não vai atualizar).
Includes excessivos / “cartesian explosion”\ Sinais: muitos .Include().ThenInclude() em coleções; retorno gigante; lentidão e memória alta.
Paginação incorreta\ Sinais: Skip/Take sem OrderBy; paginação feita em memória.
Projeção ineficiente\ Sinais: buscar entidade inteira quando só precisa de 3 campos (não usa Select para DTO).

B. Ciclo de vida do DbContext / concorrência

DbContext como Singleton (grave)\ Sinais: registrado como singleton; guardado em campo estático; reusado entre requests.
DbContext compartilhado em paralelo\ Sinais: Task.WhenAll com o mesmo context; uso em threads múltiplas.
Transações mal gerenciadas\ Sinais: múltiplos SaveChanges() em fluxo crítico sem transação; mistura de conexões.

C. Arquitetura e acoplamento

“Repository pattern em cima do EF” (o “repository do repository”) — muitas vezes vira redundância e piora consultas\ Sinais: GenericRepository<T> com GetAll() retornando IEnumerable<T>; falta de IQueryable controlado; specifications ruins.
Exposição de IQueryable para camadas superiores sem contrato (vazamento de infraestrutura)\ Sinais: services/controladores montando query arbitrária sobre DbSet.
Domínio acoplado ao EF\ Sinais: Entities com atributos de EF por todo lado, lógica de negócio misturada com navegações e carregamentos.

D. Modelagem e consistência

Entidades anêmicas + lógica no Service gigante
Falta de invariantes no domínio (entidade aceita estados inválidos)
Migrations caóticas, sem revisão; alteração de schema arriscada
Uso indevido de lazy loading (especialmente fora do request)

✅ Resultado: eu costumo classificar cada achado como (P0 crítico / P1 alto / P2 médio / P3 baixo) com “como reproduzir” e “como corrigir”.

2) SOLID — O que vou medir no seu código

SRP (Single Responsibility)

Classes “Deus” (Service/Manager com 1000+ linhas)
Handlers/Controllers fazendo: validação + regra + persistência + integração + mapping + logging tudo junto

OCP (Open/Closed)

switch/case gigante por tipo/status/canal
Ifs espalhados por toda parte para variações previsíveis (ex.: regras por “tipo de cliente”)

LSP (Liskov)

Heranças onde subclasses quebram contrato (lança exceptions inesperadas; altera semântica)

ISP (Interface Segregation)

Interfaces “gordas”: IService com 20 métodos, implementações forçadas a “não usar metade”

DIP (Dependency Inversion)

Domínio/Application dependendo de EF/infra diretamente
Services instanciando dependências com new (sem DI)
Dependência em classes concretas, não abstrações

📌 Na prática, eu vou apontar:

quais dependências estão invertidas
onde extrair políticas (domínio) de mecanismos (infra)
onde aplicar Strategy/Policy/Specification sem exagero

3) Clean Code — Sinais objetivos que eu procuro

Nomes ruins (métodos genéricos: Process, Handle2, DoWork)
Métodos longos (muitos níveis de indentação, muitos parâmetros)
Complexidade ciclomática alta (muitos if, &&, ||, switch)
Duplicação de regras e validações
Exceptions como fluxo (try/catch controlando lógica)
Logs ruins (sem correlação, sem contexto, loga dados sensíveis)
Testabilidade baixa (métodos estáticos, dependências escondidas, código sem interfaces)
Mistura de camadas (Controller acessa DbContext direto; Application acessa HttpClient sem abstração, etc.)

Se você quiser rodar a varredura aí (e só me mandar os outputs)

Aqui vai um “kit rápido” de checks úteis (você roda e cola os resultados):

1) Build e warnings

dotnet build -c Release

2) Formatação (se já tiver dotnet format configurado)

dotnet format

3) Analyzers recomendados (se puder adicionar)

SonarAnalyzer.CSharp
StyleCop.Analyzers
Microsoft.CodeAnalysis.NetAnalyzers

4) Métricas (se você usa VS Enterprise: Code Metrics)

Ou me mande um print/export das métricas de Maintainability / Cyclomatic Complexity.

Se você me mandar:

lista dos Top 20 arquivos por complexidade
e os Top 20 por duplicação / warnings\ eu já consigo apontar o grosso das violações.

Como você pode me enviar para eu analisar agora

✅ Melhor caminho:

Compacte voyager (sem bin/obj)
Garanta que não vai appsettings.Production.json com segredos
Faça upload aqui do ZIP

Ou, se não puder mandar o repo todo:

📂 Envie estas pastas/arquivos (já dá para um diagnóstico bem forte):

*.csproj, Directory.Build.props/targets (se houver)
Camada de infraestrutura/dados: DbContext, EntityTypeConfigurations, Migrations
Services/UseCases/Handlers principais
Controllers (ou endpoints minimal)
3–5 fluxos críticos (ex.: criação de ordem, fechamento, cálculo, etc.)
Um exemplo de query “pesada” (ex.: listagem com filtros)

Próxima pergunta (para eu começar do jeito certo)

Você prefere que eu faça a avaliação:

Foco performance EF (N+1, includes, tracking, paginação, concorrência)
Foco arquitetura SOLID/Clean (camadas, acoplamento, coesão, testes)
Equilibrado (um pouco de cada, com priorização)

Se você me mandar o ZIP (ou um recorte do projeto), eu começo a revisão e te devolvo um relatório com achados e recomendações práticas.

SAP vs ERP

January 28, 2026

Luis Fernando Chaim

Inteligência Artificial No Comments

Prompt

Quais as melhores soluções para integrar SAP (usando API RESTful) com Oracle ERP (usando acesso direto ao Banco de Dados)?

Gemini

Integrar o SAP com o Oracle ERP utilizando métodos tão distintos — de um lado uma interface moderna baseada em serviços (REST) e do outro um método legado de acesso direto ao banco de dados — exige uma camada de integração robusta para garantir a segurança, a integridade dos dados e a performance.

Aqui estão as melhores soluções para esse cenário, divididas por arquitetura:

1. iPaaS (Integration Platform as a Service) – A Solução Recomendada

Utilizar uma plataforma de integração moderna é a forma mais segura de conectar esses mundos. Essas ferramentas possuem conectores nativos tanto para APIs REST quanto para JDBC/ODBC (Oracle).

SAP Integration Suite (BTP): Se a sua empresa já usa SAP, esta é a escolha natural. Ela possui adaptadores prontos para se conectar a bancos de dados externos e transformar os dados para o formato JSON/XML exigido pelas APIs do SAP.
MuleSoft ou Dell Boomi: Líderes de mercado que funcionam como um “tradutor” universal. Eles podem buscar dados no Oracle via SQL, aplicar regras de negócio e disparar o POST/PATCH para a API do SAP de forma orquestrada.

2. Camada de Middleware com Python ou Node.js (Customizado)

Se você busca uma solução de baixo custo (Open Source) e tem equipe de desenvolvimento, pode criar um serviço intermediário.

Como funciona: Um script em Python (usando bibliotecas como cx_Oracle para o banco e requests para o SAP) atua como um worker.
Vantagem: Total controle sobre a lógica de transformação.
Desvantagem: Requer manutenção de infraestrutura, logs e tratamento de erros manual.

3. Oracle Integration Cloud (OIC)

Se o peso da integração estiver mais focado no lado Oracle, o OIC é excelente. Ele consegue ler as tabelas/views do Oracle ERP nativamente e expor ou consumir serviços REST para o SAP.

Fluxo Lógico da Integração

Aspecto	Oracle (Banco de Dados)	Camada de Integração (Middleware)	SAP (API REST)
Ação	Leitura de Tabelas/Views	Mapeamento e De-para	Consumo do Endpoint
Segurança	Credenciais de DB / VPN	Criptografia e Logs	OAuth 2.0 / API Key
Formato	SQL / Relacional	JSON / XML	RESTful

Considerações Críticas

Uso de Views e Staging Tables: Nunca conecte o SAP diretamente às tabelas transacionais do Oracle. Crie Views ou Tabelas de Passagem (Staging) no Oracle para que o impacto de performance seja mínimo e a segurança seja preservada.
Sincronização: Defina se a integração será Baseada em Eventos (gatilhos no banco) ou Batch (processamento por lotes em horários específicos).
Tratamento de Erros: Como o banco de dados não devolve erros de aplicação (ex: “cliente não encontrado”), o seu middleware deve ser capaz de capturar o erro da API do SAP e gravar esse status de volta em uma tabela de log no Oracle.

Qual é o volume de dados que você pretende trafegar e qual o nível de criticidade (tempo real ou diário)? Com essa informação, posso te ajudar a detalhar a arquitetura técnica.

Prompt

A solução Informatica, para integração atenderia essa demanda?

Gemini

Sim, a Informatica atende plenamente a essa demanda, sendo uma das soluções líderes de mercado para este cenário híbrido.

Seja através do clássico Informatica PowerCenter (on-premise) ou do moderno IDMC (Intelligent Data Management Cloud), a ferramenta é especialista em “unir esses dois mundos”.

Veja como ela resolve cada ponto do seu problema:

Por que a Informatica é ideal para este cenário?

A força da Informatica está na sua capacidade de lidar com ETL (Extract, Transform, Load) de alta performance, essencial quando se fala em acesso direto a bancos de dados Oracle, ao mesmo tempo em que possui conectores modernos para APIs.

1. Conector Oracle (Lado do Banco de Dados)

Diferente de scripts manuais, a Informatica utiliza conectores nativos otimizados que:

Lidam com grandes volumes de dados sem sobrecarregar o ERP Oracle.
Permitem captura de dados alterados (CDC – Change Data Capture), enviando para o SAP apenas o que mudou no banco, em vez de ler toda a tabela novamente.

2. REST API Connector (Lado do SAP)

A Informatica possui um conector de Web Services/REST que:

Consome o JSON/XML da API do SAP nativamente.
Gerencia automaticamente a autenticação (OAuth 2.0, API Keys).
Faz a orquestração: ela pode ler o Oracle, transformar os dados (ex: converter o formato de data do banco para o formato esperado pela API) e fazer a chamada REST.

3. Transformação de Dados

Este é o ponto onde ela brilha. Como o SAP e o Oracle ERP raramente possuem a mesma estrutura de dados, a Informatica oferece uma interface visual para fazer o “de-para” (mapping) complexo, tratar nulos e validar regras de negócio antes do dado chegar ao destino.

Qual versão escolher?

Solução	Perfil	Melhor Uso
Informatica IDMC (Cloud)	Moderna / SaaS	Se você busca agilidade e integração com SAP S/4HANA na nuvem.
Informatica PowerCenter	Robusta / On-premise	Se o seu Oracle e seu SAP estão dentro do seu próprio data center e você precisa de controle total da infra.

Onde ter atenção (Pontos de Cuidado)

Custo de Licenciamento: A Informatica é uma ferramenta enterprise. Se o volume de integrações for muito pequeno (apenas 1 ou 2 tabelas), o custo pode ser alto comparado a soluções mais leves ou iPaaS.
Curva de Aprendizado: Exige profissionais capacitados na ferramenta para configurar os mappings e fluxos de trabalho corretamente.

Resumo da Arquitetura com Informatica

Você já possui licenciamento da Informatica na empresa ou está avaliando a compra para este projeto específico? Se estiver avaliando, posso te indicar também algumas alternativas mais leves (como o Talend ou o próprio SAP BTP) para comparação.

Prompt

Quais soluções semelhantes e alternativas ao Informatica, poderiam ser utilizadas?

Gemini

Para substituir ou complementar o Informatica, existem diversas soluções que variam conforme o orçamento, a necessidade de performance e a facilidade de uso. Como o seu cenário envolve Extração (Oracle DB) e Consumo de API (SAP REST), as alternativas podem ser divididas em três categorias:

1. Líderes de Mercado (iPaaS e ETL Enterprise)

Essas ferramentas competem diretamente com o Informatica em termos de robustez e segurança.

Talend (Qlik Talend Cloud): * Por que considerar: É a alternativa mais próxima do Informatica. Possui componentes nativos poderosos para SQL (Oracle) e para chamadas REST (SAP).
- Vantagem: Oferece uma versão Open Studio (gratuita para desenvolvimento) e gera código Java nativo, o que costuma ser muito rápido para grandes volumes.
MuleSoft (Anypoint Platform): * Por que considerar: Se o foco for transformar a integração em um serviço reutilizável. Ele é excelente para criar uma “camada de API” entre o Oracle e o SAP.
- Vantagem: Gestão de APIs superior, facilitando o monitoramento de quem está acessando os dados do Oracle.
Oracle Data Integrator (ODI): * Por que considerar: Se o banco de dados Oracle for o “coração” da sua operação. Ele usa a arquitetura E-LT, processando as transformações dentro do próprio banco Oracle, o que é extremamente performático.
- Vantagem: Conectividade nativa profunda com Oracle ERP.

2. Soluções “Nativas” dos Fabricantes

Muitas vezes, usar a ferramenta de um dos lados simplifica o licenciamento e o suporte.

SAP Integration Suite (BTP): * Melhor escolha se: O objetivo final for alimentar o SAP com segurança. Ele já entende os formatos de dados do SAP e possui conectores JDBC para buscar dados no Oracle.
Oracle Integration Cloud (OIC):
- Melhor escolha se: O gatilho da integração nascer no Oracle ERP. Ele possui adaptadores prontos para SAP e facilita o mapeamento visual entre os campos dos dois sistemas.

3. Alternativas Open Source e Modern Data Stack

Para empresas que preferem evitar o “lock-in” de grandes fornecedores ou buscam custos menores.

Pentaho Data Integration (PDI / Kettle):
- Muito popular no Brasil, é excelente para ler bancos de dados e disparar chamadas REST. É mais simples de aprender que o Informatica.
Apache Hop: * Um fork moderno do Pentaho, focado em orquestração de dados e compatibilidade com nuvem.
Airbyte + dbt:
- A solução queridinha da “Modern Data Stack”. O Airbyte faz a ingestão (Oracle → Staging) e um script Python ou ferramenta de integração consome a API do SAP.

Comparativo Rápido

Ferramenta	Complexidade	Foco Principal	Custo Estimado
Talend	Média	ETL Tradicional e Cloud	Médio / Alto
MuleSoft	Alta	Conectividade via API	Alto
Pentaho	Baixa	Integração de Dados	Baixo (Open Source)
SAP BTP	Média	Ecossistema SAP	Variável (por consumo)
ODI	Alta	Performance em Banco Oracle	Alto

Qual escolher?

Se você tem muito volume de dados no Oracle: Vá de Talend ou ODI.
Se a prioridade é governança e segurança no SAP: Vá de SAP Integration Suite.
Se o orçamento está apertado e a equipe é técnica: Pentaho ou Python customizado.

Você gostaria que eu detalhasse o fluxo de como uma dessas ferramentas (como o Talend ou o SAP BTP) faria a conexão técnica entre o Oracle e a API do SAP?

Java SQL Parser

January 27, 2026

Luis Fernando Chaim

Java No Comments

Prompt

Can you find best top 10 solutions to parse SQL in Java?

Gemini

Parsing SQL in Java depends heavily on your use case: do you need a lightweight tool to extract table names, or a full-blown query engine to optimize and execute queries?

Here are the top 10 solutions for parsing SQL in Java, categorized by their strengths and use cases.

1. JSqlParser (Top Choice for General Use)

JSqlParser is the most popular open-source library for Java. It uses JavaCC to translate SQL into a hierarchy of Java classes, allowing you to navigate the structure using the Visitor Pattern.

Best for: Modifying SQL, extracting metadata (tables, columns), and simple SQL translation.
Pros: RDBMS agnostic; supports MySQL, PostgreSQL, Oracle, SQL Server, and BigQuery.
Cons: Not a validator; it may accept syntactically incorrect SQL if the grammar isn’t strict.

2. Apache Calcite

Calcite is a dynamic data management framework. It doesn’t just parse; it provides a full SQL parser, validator, and cost-based optimizer.

Best for: Building your own database/query engine or performing deep semantic analysis.
Pros: Industry standard used by Apache Hive, Flink, and Druid.
Cons: Very steep learning curve; high resource overhead for simple tasks.

3. jOOQ (Parser API)

While famous as a DSL for writing SQL, jOOQ includes a powerful standalone parser. It can take a raw SQL string and convert it into a jOOQ Query object.

Best for: Dialect transpilation (e.g., converting Oracle SQL to PostgreSQL) and type-safe query manipulation.
Pros: Extremely robust; handles complex vendor-specific syntax.
Cons: Commercial license required for some features; library size is large.

4. General SQL Parser (GSP)

A commercial powerhouse, GSP is known for its incredible breadth of dialect support and deep analysis capabilities.

Best for: Enterprise-grade data lineage, SQL formatting, and complex refactoring.
Pros: Supports over 20+ dialects (Snowflake, Hive, Teradata, etc.) better than most open-source tools.
Cons: Expensive commercial license.

5. ANTLR (with Community SQL Grammars)

ANTLR is a parser generator. You don’t get a “SQL library” out of the box; instead, you take a .g4 grammar file (for MySQL, PL/SQL, etc.) and generate the Java code.

Best for: Custom SQL-like languages or when you need total control over the parsing logic.
Pros: Highly flexible; generate parsers in multiple languages.
Cons: High manual effort; you must maintain the grammar files yourself.

6. H2 Database Parser

The H2 Database is written in Java and contains its own internal recursive-descent parser. You can use its internal classes (like org.h2.command.Parser) to parse SQL.

Best for: Projects already using H2 or needing a high-speed, hand-written parser.
Pros: Very fast and battle-tested in a production database.
Cons: Not designed as a standalone library; internal APIs can change.

7. ZetaSQL

Developed by Google, ZetaSQL is the parser used for BigQuery and Spanner. It has been open-sourced with Java wrappers.

Best for: Projects that need to be 100% compatible with Google Cloud’s SQL dialect.
Pros: Precise semantic analysis and strict validation.
Cons: Primarily focused on the “GoogleSQL” dialect; complex setup.

8. Druid SQL Parser

Apache Druid contains a specialized parser built on Calcite but optimized for its own analytical needs.

Best for: Big data analytics and time-series query parsing.
Pros: Excellent performance for analytical (OLAP) queries.

9. Zql (The “Old School” Choice)

Zql is a lightweight SQL parser written in JavaCC. It is much simpler (and older) than JSqlParser.

Best for: Very small, legacy, or resource-constrained applications.
Pros: Minimal footprint.
Cons: Not updated frequently; lacks support for modern SQL features like Window Functions or JSON operators.

10. Presto/Trino Parser

Trino (formerly Presto) has a highly efficient, hand-written parser designed to handle massive scale.

Best for: High-performance distributed query processing.
Pros: Clean Abstract Syntax Tree (AST) for complex SELECT statements.

Comparison Summary

Tool	License	Best For	Complexity
JSqlParser	Apache/LGPL	General Metadata Extraction	Moderate
Calcite	Apache	Database Building / Optimization	High
jOOQ	Dual (Free/Paid)	SQL Transpilation	Moderate
GSP	Commercial	Enterprise Lineage / Formatting	Low (API-wise)
ANTLR	BSD	Custom Dialects	Very High

Would you like me to provide a code example for one of these, such as using JSqlParser to extract table names from a query?

CSharp Keyence MD X2500

January 16, 2026

Luis Fernando Chaim

DotNet No Comments

Prompt

Create C# code to send QR code with text “QXD9000” for keyence MD X2500.

Abacus

Here’s C# code to send a QR code with text “QXD9000” to a Keyence MD-X2500 marker:

using System;
using System.Net.Sockets;
using System.Text;

namespace KeyenceMarkerControl
{
    class Program
    {
        static void Main(string[] args)
        {
            string ipAddress = "192.168.0.10"; // Replace with your MD-X2500 IP address
            int port = 9004; // Default Keyence port
            string qrText = "QXD9000";
            
            try
            {
                SendQRCodeToKeyence(ipAddress, port, qrText);
                Console.WriteLine("QR code sent successfully!");
            }
            catch (Exception ex)
            {
                Console.WriteLine($"Error: {ex.Message}");
            }
        }

        static void SendQRCodeToKeyence(string ipAddress, int port, string qrText)
        {
            using (TcpClient client = new TcpClient())
            {
                // Connect to the Keyence marker
                client.Connect(ipAddress, port);
                NetworkStream stream = client.GetStream();

                // Command to set QR code data
                // Format: "WD,<data>\r" for write data command
                string command = $"WD,{qrText}\r";
                byte[] data = Encoding.ASCII.GetBytes(command);

                // Send the command
                stream.Write(data, 0, data.Length);
                Console.WriteLine($"Sent: {command.Trim()}");

                // Read response
                byte[] buffer = new byte[1024];
                int bytesRead = stream.Read(buffer, 0, buffer.Length);
                string response = Encoding.ASCII.GetString(buffer, 0, bytesRead);
                Console.WriteLine($"Response: {response.Trim()}");

                // Optional: Send print command
                string printCommand = "IP\r"; // Issue Print command
                byte[] printData = Encoding.ASCII.GetBytes(printCommand);
                stream.Write(printData, 0, printData.Length);
                Console.WriteLine($"Sent: {printCommand.Trim()}");

                // Read print response
                bytesRead = stream.Read(buffer, 0, buffer.Length);
                response = Encoding.ASCII.GetString(buffer, 0, bytesRead);
                Console.WriteLine($"Response: {response.Trim()}");

                stream.Close();
            }
        }
    }
}

Key points:

IP Address & Port: Update ipAddress with your MD-X2500’s actual IP address. Default port is typically 9004.
Commands:
- WD,<data>\r – Write data to the marker
- IP\r – Issue print command to mark the QR code
Alternative commands you might need:
- WP,<program_number>\r – Select a specific program/pattern
- RC\r – Remote control ON
- RF\r – Remote control OFF
Configuration: Ensure your MD-X2500 is configured for:
- Ethernet communication enabled
- QR code pattern set up in the device
- Remote control mode enabled

Adjust the commands based on your specific MD-X2500 configuration and communication protocol settings.