Plugin System: Custom Content Parsers
Sherwood's plugin system allows extending content parsing beyond Markdown by creating custom parsers and registering them with file extensions.
Overview
The plugin system uses compile-time registration for zero runtime overhead while enabling complete flexibility in content formats.
Creating Custom Parsers
Parser Structure
All parsers must implement the ContentParser trait:
use sherwood::plugins::{ContentParser, ParsedContent};
use sherwood::content::parser::Frontmatter;
use anyhow::Result;
use std::path::Path;
use std::collections::HashMap;
pub struct MyParser;
impl MyParser {
pub fn new() -> Box<dyn ContentParser> {
Box::new(Self)
}
}
impl ContentParser for MyParser {
fn name(&self) -> &'static str {
"myformat" // Unique identifier for error messages
}
fn parse(&self, content: &str, _path: &Path) -> Result<ParsedContent> {
// 1. Parse your content format
// 2. Extract title (optional - will fallback to filename)
// 3. Map to Sherwood frontmatter
// 4. Return structured result
let frontmatter = Frontmatter {
title: extract_title_from_content(content),
date: extract_date_from_content(content),
list: None,
page_template: None,
sort_by: None,
sort_order: None,
tags: extract_tags_from_content(content),
};
Ok(ParsedContent {
title: String::new(), // Let filename fallback work
frontmatter,
content: process_content(content), // Your content transformation
metadata: HashMap::new(),
})
}
}
The Box<dyn ContentParser> Pattern
Purpose: Type erasure for heterogeneous storage
Each parser has different memory layout, but Box<dyn ContentParser> creates a uniform "handle":
// Different parser types:
TomlParser // 24 bytes
JsonParser // 32 bytes
TextParser // 16 bytes
// All become same type after boxing:
Box<dyn ContentParser> // Same size for all, uniform interface
Benefits:
- Runtime polymorphism: Same interface, different implementations
- Memory efficiency: One allocation per parser, reused many times
- Type safety: Rust guarantees all boxed types implement
ContentParser - Extensibility: Unlimited parser types without registry changes
Return Types Explained
ParsedContent
Standardized structure for all parsers:
pub struct ParsedContent {
pub title: String, // Document title (filename fallback)
pub frontmatter: Frontmatter, // Sherwood frontmatter fields
pub content: String, // Processed content body
pub metadata: HashMap<String, String>, // Custom parser data
}
Frontmatter
Common metadata fields across all parsers:
pub struct Frontmatter {
pub title: Option<String>, // Document title
pub date: Option<String>, // Publication date
pub list: Option<bool>, // Is this a list page?
pub page_template: Option<String>, // Custom template
pub sort_by: Option<String>, // Sorting field
pub sort_order: Option<String>, // Sort direction
pub tags: Option<Vec<String>>, // Content tags
}
Parser Registration
Registration in src/main.rs
mod parsers;
use parsers::{TomlContentParser, JsonContentParser, TextContentParser, MyParser};
use sherwood::plugins::PluginRegistry;
#[tokio::main]
async fn main() {
let plugin_registry = PluginRegistry::new()
// Register parsers with primary extensions
.register("toml", TomlContentParser::new(), "toml")
.register("json", JsonContentParser::new(), "json")
.register("text", TextContentParser::new(), "txt")
.register("myformat", MyParser::new(), "myext")
// Map additional extensions to existing parsers
.map_extensions(&[
("conf", "toml"), // .conf files use TOML parser
("config", "toml"), // .config files use TOML parser
("schema", "json"), // .schema files use JSON parser
("note", "text"), // .note files use text parser
("myext2", "myformat"), // Additional extension for custom parser
]);
let cli = sherwood::SherwoodCli::new("myapp", "My static site generator")
.with_plugins(plugin_registry);
if let Err(e) = cli.run().await {
eprintln!("Error: {}", e);
std::process::exit(1);
}
}
Module Exports in src/parsers/mod.rs
pub mod toml;
pub mod json;
pub mod txt;
pub mod my_parser;
// Re-export for convenience
pub use toml::TomlContentParser;
pub use json::JsonContentParser;
pub use txt::TextContentParser;
pub use my_parser::MyParser;
File Organization
Directory Structure
docs/
├── src/
│ ├── main.rs # Plugin registration
│ └── parsers/
│ ├── mod.rs # Module declarations & re-exports
│ ├── toml.rs # TOML parser example
│ ├── json.rs # JSON parser example
│ ├── txt.rs # Text parser example
│ └── my_parser.rs # Your custom parser
├── content/
│ ├── example.toml # Will be processed by TOML parser
│ ├── data.json # Will be processed by JSON parser
│ ├── notes.txt # Will be processed by text parser
│ └── data.myext # Will be processed by custom parser
└── Sherwood.toml # Site configuration
Registration Patterns
Single Extension
.register("toml", TomlContentParser::new(), "toml")
Multiple Extensions
.register("text", TextContentParser::new(), "txt")
.map_extensions(&[
("note", "text"),
("readme", "text"),
("md", "text"), // Override built-in parser
])
Conditional Registration
let mut registry = PluginRegistry::new();
if cfg!(feature = "yaml-support") {
registry = registry.register("yaml", YamlParser::new(), "yaml");
}
if cfg!(feature = "csv-support") {
registry = registry.register("csv", CsvParser::new(), "csv");
}
Example Parsers
TOML Parser (toml.rs)
Handles structured data with frontmatter fields + content body:
title = "TOML Content Page"
date = "2024-01-05"
description = "This is a TOML content file example"
content = "This is the main content of TOML file.\n\n## Subheading\n\nMore content here in markdown format."
JSON Parser (json.rs)
Alternative structured data format:
{
"title": "JSON Content Page",
"date": "2024-01-05",
"description": "This is a JSON content file example",
"content": "# JSON Content\n\nThis is the main content of JSON file.\n\n## Subheading\n\nMore content here in markdown format."
}
Text Parser (txt.rs)
Pure document content with no frontmatter processing:
API Documentation: Authentication Endpoints
This document describes the authentication API endpoints for our web service.
## Login Endpoint
POST /api/auth/login
### Request Body
```json
{
"username": "string",
"password": "string"
}
Response
{
"token": "jwt_token_string",
"expires_in": 3600
}
## Parser Implementation Strategies
### Data-First Parsers (TOML/JSON)
**Characteristics**:
- Structured data format
- Separate frontmatter and content
- Field validation and type safety
**Implementation Pattern**:
```rust
fn parse(&self, content: &str, path: &Path) -> Result<ParsedContent> {
let parsed: serde_json::Value = serde_json::from_str(content)?;
let frontmatter = Frontmatter {
title: parsed.get("title").and_then(|v| v.as_str()).map(|s| s.to_string()),
date: parsed.get("date").and_then(|v| v.as_str()).map(|s| s.to_string()),
// ... map other fields
};
Ok(ParsedContent {
title: String::new(), // Filename fallback
frontmatter,
content: parsed
.get("content")
.and_then(|v| v.as_str())
.unwrap_or("")
.to_string(),
metadata: HashMap::new(),
})
}
Document Parsers (TXT/Markdown)
Characteristics:
- Unstructured text content
- Optional frontmatter markers
- Content preservation focus
Implementation Pattern:
fn parse(&self, content: &str, _path: &Path) -> Result<ParsedContent> {
// No frontmatter parsing - entire file is content
let frontmatter = Frontmatter::default(); // Empty frontmatter
Ok(ParsedContent {
title: String::new(), // Will be overridden by filename fallback
frontmatter,
content: content.to_string(), // Preserve exactly
metadata: HashMap::new(),
})
}
Custom Format Parsers
Characteristics:
- Proprietary data format
- Custom transformation logic
- Domain-specific requirements
Implementation Pattern:
fn parse(&self, content: &str, path: &Path) -> Result<ParsedContent> {
// Custom parsing logic for your format
let parsed = parse_my_custom_format(content)?;
// Map your format's fields to Sherwood frontmatter
let frontmatter = Frontmatter {
title: parsed.title,
date: parsed.date,
list: parsed.is_list_page,
// ... map your format's metadata
};
Ok(ParsedContent {
title: String::new(),
frontmatter,
content: transform_to_html(parsed.content),
metadata: parsed.custom_metadata,
})
}
Testing Your Parser
Unit Tests
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_parser_parsing() {
let parser = MyParser::new();
let content = "your format content";
let path = Path::new("test.myext");
let result = parser.parse(content, path);
assert!(result.is_ok());
let parsed = result.unwrap();
assert_eq!(parsed.title, "test");
// More assertions for content, frontmatter...
}
#[test]
fn test_error_handling() {
let parser = MyParser::new();
let invalid_content = "malformed content";
let path = Path::new("test.myext");
let result = parser.parse(invalid_content, path);
assert!(result.is_err());
}
}
Integration Tests
# Create test file
echo "title: Test Content" > content/test.myext
# Run Sherwood
cargo run -- generate
# Check output
ls dist/test.html
cat dist/test.html
Best Practices
Design Principles
- Keep parsers focused: One format per parser
- Handle errors gracefully: Use
anyhowfor context - Preserve content: Don't over-process unless necessary
- Document format: Include example files in
content/ - Test thoroughly: Unit + integration tests
Error Handling
fn parse(&self, content: &str, path: &Path) -> Result<ParsedContent> {
let parsed = parse_my_format(content)
.map_err(|e| anyhow::anyhow!("Failed to parse {}: {}", path.display(), e))?;
Ok(transform_to_parsed_content(parsed))
}
Performance Considerations
- One allocation:
Box::new(Self)innew()method - Efficient parsing: Use streaming for large files if needed
- Memory reuse: Parser instance reused for all files
- Error early: Validate format before expensive operations
Configuration Integration
# Sherwood.toml
[plugins]
# Future: plugin-specific configuration
[plugins.myformat]
enable_transforms = true
output_format = "html"
Advanced Features
Content Transformation
fn transform_content(content: &str) -> String {
// Apply custom transformations:
// - Auto-link URLs
// - Syntax highlighting markers
// - Smart paragraph detection
// - Custom HTML injection
content.to_string()
}
Custom Metadata
fn create_metadata(parsed: &MyParsedFormat) -> HashMap<String, String> {
let mut metadata = HashMap::new();
metadata.insert("word_count".to_string(), parsed.word_count.to_string());
metadata.insert("reading_time".to_string(), parsed.reading_time.to_string());
metadata.insert("difficulty".to_string(), parsed.difficulty.to_string());
metadata
}
Template Integration
impl ContentParser for MyParser {
fn parse(&self, content: &str, path: &Path) -> Result<ParsedContent> {
let mut parsed = base_parse(content, path)?;
// Specify custom template for this format
parsed.frontmatter.page_template = Some("my_template.stpl".to_string());
Ok(parsed)
}
}
This plugin system provides maximum flexibility while maintaining zero overhead and type safety!