Input Validation and Sanitization: Preventing Injection Attacks in Web Apps

Input validation is the first gatekeeper of application security. Every piece of data that enters your application — from form fields to API payloads to file uploads — is a potential attack vector until proven otherwise.

Whitelist Over Blacklist

The most important principle in input validation is to whitelist rather than blacklist. A whitelist defines exactly what is allowed and rejects everything else. A blacklist attempts to enumerate what is dangerous, which inevitably misses novel attack patterns.

// Whitelist validation — prefer this approach
const CountryCodeSchema = z.string()
  .length(2)
  .toUpperCase()
  .refine((val) => COUNTRY_CODES.includes(val), {
    message: 'Invalid country code',
  });

const UsernameSchema = z.string()
  .min(3)
  .max(30)
  .regex(/^[a-zA-Z0-9_]+$/, {
    message: 'Username must contain only letters, numbers, and underscores',
  });

Whitelist validation applies to every input type. For select dropdowns and radio buttons, validate against the list of acceptable values rather than trusting that the client sent a valid option. For free-text fields, enforce character class restrictions appropriate to the field's purpose — phone numbers need digits, email addresses need the email format, names need alphabetical characters and spaces.

Schema Validation Libraries

Schema-first validation libraries provide declarative, composable input validation that catches type errors, missing fields, and constraint violations before application logic executes.

import { z } from 'zod';

// Comprehensive request schema
const CreateOrderSchema = z.object({
  userId: z.string().uuid(),
  items: z.array(z.object({
    productId: z.string().uuid(),
    quantity: z.number().int().positive().max(99),
    price: z.number().positive().multipleOf(0.01),
  })).min(1).max(50),
  shippingAddress: z.object({
    street: z.string().min(5).max(200),
    city: z.string().min(2).max(100),
    postalCode: z.string().regex(/^\d{5}(-\d{4})?$/),
    country: z.string().length(2),
  }),
  couponCode: z.string().max(20).optional(),
  agreeToTerms: z.literal(true),
});

// Validate at the boundary — before any processing
function createOrderHandler(req: Request, res: Response) {
  const result = CreateOrderSchema.safeParse(req.body);

  if (!result.success) {
    return res.status(422).json({
      error: 'Validation failed',
      details: result.error.issues.map((issue) => ({
        path: issue.path.join('.'),
        message: issue.message,
      })),
    });
  }

  const orderData = result.data;
  // Proceed with validated, typed data
}

Other popular validation libraries include Ajv (JSON Schema), Joi, Yup, and Valibot. Each offers different trade-offs between bundle size, performance, and TypeScript integration. Zod and Valibot excel in TypeScript-first projects, while Ajv is preferred for JSON Schema compliance.

SQL and NoSQL Injection Prevention

Parameterized queries are the definitive defense against SQL injection. The database driver handles escaping, eliminating the possibility of injected SQL syntax becoming executable commands.

// Parameterized query — safe
await db.query(
  'INSERT INTO products (name, price, category_id) VALUES ($1, $2, $3)',
  [name, price, categoryId]
);

The same principle applies to NoSQL databases. MongoDB query operators like $where, $regex, and $ne can be exploited if user input is passed directly into query objects. Use schema validation and avoid raw query operators in user-facing endpoints.

// MongoDB — sanitize operators in queries
function sanitizeMongoQuery(input: Record<string, unknown>): Record<string, unknown> {
  const sanitized: Record<string, unknown> = {};
  for (const [key, value] of Object.entries(input)) {
    // Reject query operators at the top level
    if (key.startsWith('$')) {
      throw new ValidationError('Query operators not allowed');
    }
    sanitized[key] = typeof value === 'string' ? escapeRegex(value) : value;
  }
  return sanitized;
}

HTML Sanitization for User-Generated Content

When accepting HTML content from users — blog comments, forum posts, rich text editors — raw HTML can contain XSS vectors. Use a robust HTML sanitization library with an allowlist approach.

import sanitizeHtml from 'sanitize-html';

function sanitizeUserContent(html: string): string {
  return sanitizeHtml(html, {
    allowedTags: [
      'b', 'i', 'em', 'strong', 'a', 'p', 'br',
      'ul', 'ol', 'li', 'blockquote', 'code', 'pre',
    ],
    allowedAttributes: {
      'a': ['href', 'rel'],
    },
    allowedSchemes: ['https', 'mailto'],
    transformTags: {
      'a': (tagName, attribs) => ({
        tagName,
        attribs: {
          ...attribs,
          rel: 'noopener noreferrer nofollow',
          target: '_blank',
        },
      }),
    },
  });
}

Strip event handlers (onclick, onerror, onload), javascript: URLs, and <script>, <iframe>, and <object> tags entirely. Consider using Markdown instead of raw HTML for user content — it provides rich formatting without the XSS surface.

File Upload Validation

File upload endpoints are notoriously vulnerable to injection attacks. Validate file content at multiple layers — MIME type, magic bytes, file extension, and content scanning.

import { fileTypeFromBuffer } from 'file-type';

async function validateFileUpload(file: Express.Multer.File): Promise<void> {
  const MAX_SIZE = 10 * 1024 * 1024; // 10 MB
  const ALLOWED_MIMES = ['image/jpeg', 'image/png', 'image/webp', 'application/pdf'];

  // Check file size
  if (file.size > MAX_SIZE) {
    throw new ValidationError('File exceeds maximum size of 10 MB');
  }

  // Verify magic bytes — not just Content-Type header
  const type = await fileTypeFromBuffer(file.buffer);
  if (!type || !ALLOWED_MIMES.includes(type.mime)) {
    throw new ValidationError('File type not allowed');
  }

  // Verify extension matches actual type
  const ext = path.extname(file.originalname).toLowerCase();
  const expectedExts: Record<string, string[]> = {
    'image/jpeg': ['.jpg', '.jpeg'],
    'image/png': ['.png'],
    'image/webp': ['.webp'],
    'application/pdf': ['.pdf'],
  };

  if (!expectedExts[type.mime]?.includes(ext)) {
    throw new ValidationError('File extension does not match content type');
  }
}

Normalization Before Validation

Attackers often use Unicode normalization tricks to bypass validation — homoglyph characters, zero-width spaces, and bidirectional override characters. Normalize input to a canonical form before validation.

function normalizeAndValidate(input: string, schema: z.ZodSchema): string {
  // NFC normalization for consistent Unicode handling
  const normalized = input.normalize('NFC').trim();
  return schema.parse(normalized);
}

For email addresses, normalize to lowercase and apply RFC 5321 validation. For URLs, parse, validate the structure, and normalize the format. For identifiers, strip invisible Unicode characters that could be used for impersonation.

Defense in Depth

Input validation at the application boundary is essential, but it should not be the only defense. Apply validation at multiple layers — client-side for UX, API gateway for bulk filtering, application layer for semantic validation, and the database layer through constraints and parameterized queries.

// Multi-layer: API gateway schema + application validation + DB constraints
// API Gateway: Restrict payload size, reject malformed JSON
// Application: Semantic validation of business rules
if (order.total > user.dailyLimit) {
  throw new BusinessRuleError('Exceeds daily spending limit');
}
// Database: Foreign keys, CHECK constraints, NOT NULL

Rejecting Unknown Fields

API endpoints should reject requests containing unexpected fields. An attacker adding an extra field to a payload could exploit mass assignment vulnerabilities, privilege escalation, or internal API behavior.

// Zod: stripUnknown false by default — unknown fields cause validation error
const UpdateUserSchema = z.object({
  name: z.string().optional(),
  email: z.string().email().optional(),
}).strict(); // Rejects unknown fields

Input validation and sanitization form the outermost layer of application defense. Our <a href="/services/web-development">web development services</a> include comprehensive input validation architecture for every endpoint. Contact SoniNow to build applications that treat every input as potentially hostile.